# What is the probability that the child does not have HIV? Round your answer to the nearest 10-th of a percent.

Let X  = (X1,…,Xn) be the blood pressure (measured in mmHg) and let Y = (Y1,…,Yn) be the cortisol level (measured in mcg/dL) recorded for n = 79 patients recruited for a study in a hospital (Xi and Yi are measurements for the same patient). What test is most appropriate to gather evidence towards the alternative hypothesis that blood pressure is associated with cortisol level? Please provide the reasoning in detail for your answer.

A)  The two-sample paired t-test with the null hypothesis that the means of X and Y differ.

B)  The test with the null hypothesis that the Pearson correlation coefficient between X and Y is zero.

C)  The test with the null hypothesis that the regression coefficient is zero in a linear regression with response variable X (blood pressure) and explanatory variable Y (cortisol level).

(5 points)

ii)  Suppose that a treatment is proposed to reduce the duration from the time of infection date, to the time at which a first negative test is recorded in people with mild COVID-19 (call this time period the duration). Suppose that 27 people with mild COVID-19 (the study population) are administered the treatment and 73 people with mild COVID-19 are not administered the

treatment (the control population). Both populations are sampled from patients tested at the same clinic over the same period. Let the durations for the study sample be X = (X1, X2,…), and the durations for the control sample be Y = (Y1, Y2,…). What test is most appropriate to gather evidence towards the alternative hypothesis that the treatment reduces the duration? Please provide the reasoning for your answer. Please provide the reasoning in detail for your answer.

A)  The one-sided two-sample unpaired t-test with H0: The mean of X is greater than or equal to the mean of Y.

B)  The one-sided two-sample unpaired t-test with the null hypothesis that the mean of X is less than or equal to the mean of Y.

C)  The test against the null hypothesis that the Spearman’s ranked correlation coefficient between X and Y is zero.

D)  The one-sided two-sample paired t-test against H0: The mean difference between Xi and Yi is less than or equal to zero.

E)  The two-sided two-sample paired t-test with the null hypothesis that the mean difference between Xi and Yi is zero.

(5 points)

iii)  Road vehicle accidents involving ambulances have more detrimental outcomes than accidents involving other similarly sized vehicles (Ray and Kupas, 2005). Measures to avoid such accidents are continually being refined by organizations involved in emergency medical services. Suppose that a city council is interested in knowing if adoption of such measures has lead to an improvement over the last decade. Suppose that the ratio between the number of accidents

iv)  involving ambulances (the numerator) and the number of kilometers driven by ambulances (the denominator) has been recorded (rt with units’ number of accidents per kilometer year) for each year t over the past decade. Which single one of the following statistical quantities is most relevant

A)  The sample standard deviation of rt.

B)  The sample mean of rt.

C)  The Pearson correlation coefficient ρ between rt and t.

D)  The regression coefficient for t in a linear regression with rt as the response variable and t as the explanatory variable.

E)  The regression coefficient for rt in a linear regression with rt as the explanatory variable and t as the response variable.

## Problem 2: Bayes’ rule

A study was conducted to assess the sensitivity and specificity of four different human immunodeficiency virus (HIV) serology tests (Koblavi-D`eme et al. 2001). The Determine test was among the four, it was developed by Abbott Laboratories (an American provider of health care, medical devices and pharmaceuticals) and was found to have a true negative rate (the true negative rate is also called specificity) of 99.4% and a true positive rate (the true positive rate is also called sensitivity) of 100%. The true negative rate of a test for a disease is the probability that someone without the disease tests negative. The true positive rate of a test for a disease is the probability that someone with the disease tests positive. HIV may be transmitted from an expecting parent to their child by transmission during childbirth or by transmission to the fetus during pregnancy (throughout, assume that there’s no other way for a newborn to be infected). Treatment by the drugs zidovudine or nevirapine has been shown to reduce the rate of these sorts of transmission of HIV by 38% to 50% in the absence of other intervention (Koblavi-D`eme et al. 2001).

a) Suppose that an expecting parent is infected with HIV and they are treated with zidovudine or nevirapine during pregnancy. Suppose that after they give birth, a Determine serology test reports a positive test for HIV. What is the probability that the child does not have HIV? Round your answer to the nearest 10-th of a percent.

(6 points)

b) UNAIDS (an organization established by the United Nations Economic and Social Council) estimates the prevalence of HIV in Cˆote d’Ivoire among people aged 15-49 to be 2.6%. If a Determine serology test reported a positive test for HIV in someone selected uniformly at random among all people in Cˆote d’Ivoire aged 15-49, what is the probability that the person does not have HIV? Round your answer to the nearest 10-th of a percent.

(4 points)

c) In the USA, according to the Centers for Disease Control (a public health institute within the United States Department of Health and Human Services), if someone has a positive serology test for HIV they are not diagnosed as HIV-positive until a second follow-up test also yields a positive test result. What is the probability that someone is incorrectly diagnosed as HIV-positive (i.e., if someone is not infected with HIV, what is the probability that their first test and also their second follow-up test are both positive)? Suppose that both tests are Determine serology tests, and also assume that the test results are statistically independent. Express your answer in expected number of events in a million (i.e. something like ‘a 36 in a million chance’ or ‘a one in a million chance’). Also: In one sentence, what is a possible argument as to why the assumption of independence of the two test results might be wrong? (Your argument does not have to be sound, but it must be valid without being tautological).

(3 points)

d) What is the probability that an HIV infected expecting parent transmits HIV to their child either during childbirth or through transmitting HIV to the fetus during pregnancy, given that the parent has not received treatment with the drugs zidovudine or nevirapine, and in the absence of other intervention, according to the preamble of this problem (in concordance with Koblavi-D`eme et al. 2001)?

(2 points) Let X  = (X1,…,Xn) be the blood pressure (measured in mmHg) and let Y = (Y1,…,Yn) be the cortisol level (measured in mcg/dL) recorded for n = 79 patients recruited for a study in a hospital (Xi and Yi are measurements for the same patient). What test is most appropriate to gather evidence towards the alternative hypothesis that blood pressure is associated with cortisol level? Please provide the reasoning in detail for your answer.

A)  The two-sample paired t-test with the null hypothesis that the means of X and Y differ.

B)  The test with the null hypothesis that the Pearson correlation coefficient between X and Y is zero.

C)  The test with the null hypothesis that the regression coefficient is zero in a linear regression with response variable X (blood pressure) and explanatory variable Y (cortisol level).

(5 points)

ii)  Suppose that a treatment is proposed to reduce the duration from the time of infection date, to the time at which a first negative test is recorded in people with mild COVID-19 (call this time period the duration). Suppose that 27 people with mild COVID-19 (the study population) are administered the treatment and 73 people with mild COVID-19 are not administered the

treatment (the control population). Both populations are sampled from patients tested at the same clinic over the same period. Let the durations for the study sample be X = (X1, X2,…), and the durations for the control sample be Y = (Y1, Y2,…). What test is most appropriate to gather evidence towards the alternative hypothesis that the treatment reduces the duration? Please provide the reasoning for your answer. Please provide the reasoning in detail for your answer.

A)  The one-sided two-sample unpaired t-test with H0: The mean of X is greater than or equal to the mean of Y.

B)  The one-sided two-sample unpaired t-test with the null hypothesis that the mean of X is less than or equal to the mean of Y.

C)  The test against the null hypothesis that the Spearman’s ranked correlation coefficient between X and Y is zero.

D)  The one-sided two-sample paired t-test against H0: The mean difference between Xi and Yi is less than or equal to zero.

E)  The two-sided two-sample paired t-test with the null hypothesis that the mean difference between Xi and Yi is zero.

(5 points)

iii)  Road vehicle accidents involving ambulances have more detrimental outcomes than accidents involving other similarly sized vehicles (Ray and Kupas, 2005). Measures to avoid such accidents are continually being refined by organizations involved in emergency medical services. Suppose that a city council is interested in knowing if adoption of such measures has lead to an improvement over the last decade. Suppose that the ratio between the number of accidents

iv)  involving ambulances (the numerator) and the number of kilometers driven by ambulances (the denominator) has been recorded (rt with units’ number of accidents per kilometer year) for each year t over the past decade. Which single one of the following statistical quantities is most relevant

A)  The sample standard deviation of rt.

B)  The sample mean of rt.

C)  The Pearson correlation coefficient ρ between rt and t.

D)  The regression coefficient for t in a linear regression with rt as the response variable and t as the explanatory variable.

E)  The regression coefficient for rt in a linear regression with rt as the explanatory variable and t as the response variable.

## Problem 2: Bayes’ rule

A study was conducted to assess the sensitivity and specificity of four different human immunodeficiency virus (HIV) serology tests (Koblavi-D`eme et al. 2001). The Determine test was among the four, it was developed by Abbott Laboratories (an American provider of health care, medical devices and pharmaceuticals) and was found to have a true negative rate (the true negative rate is also called specificity) of 99.4% and a true positive rate (the true positive rate is also called sensitivity) of 100%. The true negative rate of a test for a disease is the probability that someone without the disease tests negative. The true positive rate of a test for a disease is the probability that someone with the disease tests positive. HIV may be transmitted from an expecting parent to their child by transmission during childbirth or by transmission to the fetus during pregnancy (throughout, assume that there’s no other way for a newborn to be infected). Treatment by the drugs zidovudine or nevirapine has been shown to reduce the rate of these sorts of transmission of HIV by 38% to 50% in the absence of other intervention (Koblavi-D`eme et al. 2001).

a) Suppose that an expecting parent is infected with HIV and they are treated with zidovudine or nevirapine during pregnancy. Suppose that after they give birth, a Determine serology test reports a positive test for HIV. What is the probability that the child does not have HIV? Round your answer to the nearest 10-th of a percent.

(6 points)

b) UNAIDS (an organization established by the United Nations Economic and Social Council) estimates the prevalence of HIV in Cˆote d’Ivoire among people aged 15-49 to be 2.6%. If a Determine serology test reported a positive test for HIV in someone selected uniformly at random among all people in Cˆote d’Ivoire aged 15-49, what is the probability that the person does not have HIV? Round your answer to the nearest 10-th of a percent.

(4 points)

c) In the USA, according to the Centers for Disease Control (a public health institute within the United States Department of Health and Human Services), if someone has a positive serology test for HIV they are not diagnosed as HIV-positive until a second follow-up test also yields a positive test result. What is the probability that someone is incorrectly diagnosed as HIV-positive (i.e., if someone is not infected with HIV, what is the probability that their first test and also their second follow-up test are both positive)? Suppose that both tests are Determine serology tests, and also assume that the test results are statistically independent. Express your answer in expected number of events in a million (i.e. something like ‘a 36 in a million chance’ or ‘a one in a million chance’). Also: In one sentence, what is a possible argument as to why the assumption of independence of the two test results might be wrong? (Your argument does not have to be sound, but it must be valid without being tautological).

(3 points)

d) What is the probability that an HIV infected expecting parent transmits HIV to their child either during childbirth or through transmitting HIV to the fetus during pregnancy, given that the parent has not received treatment with the drugs zidovudine or nevirapine, and in the absence of other intervention, according to the preamble of this problem (in concordance with Koblavi-D`eme et al. 2001)?

(2 points)

• midterm.pdf