Hypothesis Testing & Correlation Analysis


Amanda Hidalgo 

LIS 4273: Adv Stats & Analytics 


Hypothesis Testing & Correlation Analysis


Part 1 

 A. State the null and alternative hypothesis:

   Null Hypothesis (H0): The machine is producing cookies according to the manufacturer's specifications, i.e., the population mean (μ) breaking strength is 70 pounds.

   Alternative Hypothesis (Ha): The machine is not producing cookies according to the manufacturer's specifications, i.e., the population mean (μ) breaking strength is not equal to 70 pounds.


B. Is there evidence that the machine is not meeting the manufacturer's specifications for average strength? Use a 0.05 level of significance:

   To determine if there is evidence that the machine is not meeting the manufacturer's specifications, we need to perform a hypothesis test using the sample data.


   We have:

   - Sample mean (x̄) = 69.1 pounds

   - Population standard deviation (σ) = 3.5 pounds

   - Sample size (n) = 49

   - Level of significance (α) = 0.05

    We'll use a two-tailed z-test because we are interested in whether the population mean is not equal to 70 pounds.

  Calculating the test statistic (z):

   z = (x̄ - μ) / (σ / √n)

   z = (69.1 - 70) / (3.5 / √49)

   z = (-0.9) / (0.5)

   z = -1.8

   Now, we compare the calculated z-value to the critical z-value at a 0.05 significance level. For a two-tailed test at α = 0.05, the critical z-values are approximately ±1.96.

   Since -1.8 is within the range of -1.96 to 1.96, we do not reject the null hypothesis. Therefore, there is not enough evidence to conclude that the machine is not meeting the manufacturer's specifications for average strength at the 0.05 level of significance.

C. Compute the p-value and interpret its meaning:

   The p-value is the probability of obtaining a test statistic as extreme as the one calculated (z = -1.8) under the null hypothesis. You can find the p-value using a standard normal distribution table or calculator. For z = -1.8, the p-value is approximately 0.0714.

   Interpretation: The p-value of 0.0714 is greater than the chosen significance level of 0.05. This means that there is not enough evidence to reject the null hypothesis. The machine may still be producing cookies according to the manufacturer's specifications.

D. What would be your answer in (B) if the standard deviation were specified as 1.75 pounds?

   If the standard deviation were specified as 1.75 pounds, you would need to recalculate the z-test using the new standard deviation value while keeping the same sample mean, sample size, and significance level. The steps would be the same as in part B, but with σ = 1.75 pounds.

Calculating the test statistic (z):

z = (x̄ - μ) / (σ / √n)

z = (69.1 - 70) / (1.75 / √49)

z = (-0.9) / (0.25)

z = -3.6

Now, we compare the calculated z-value to the critical z-value at a 0.05 significance level. For a two-tailed test at α = 0.05, the critical z-values are approximately ±1.96.

Since -3.6 is less than -1.96, we reject the null hypothesis. Therefore, if the standard deviation were specified as 1.75 pounds, there is enough evidence to conclude that the machine is not meeting the manufacturer's specifications for average strength at the 0.05 level of significance.

E. What would be your answer in (B) if the sample mean were 69 pounds and the standard deviation is 3.5 pounds?

   If the sample mean were 69 pounds and the standard deviation were 3.5 pounds, you would still need to perform the same hypothesis test as in part B, but with the new sample mean and standard deviation values. The steps would be the same, with x̄ = 69 pounds and σ = 3.5 pounds.

Calculating the test statistic (z):

z = (x̄ - μ) / (σ / √n)

z = (69 - 70) / (3.5 / √49)

z = (-1) / (0.5)

z = -2

Now, compare the calculated z-value to the critical z-value at a 0.05 significance level. For a two-tailed test at α = 0.05, the critical z-values are approximately ±1.96.

Since -2 is within the range of -1.96 to 1.96, we do not reject the null hypothesis. Therefore, if the sample mean were 69 pounds and the standard deviation were 3.5 pounds, there is not enough evidence to conclude that the machine is not meeting the manufacturer's specifications for average strength at the 0.05 level of significance, similar to the original scenario in part B.


Part 2 

If x̅ = 85, σ = standard deviation = 8, and n=64, set up 95% confidence interval estimate of the population mean μ.

    To set up a 95% confidence interval estimate of the population mean (μ) when you have a sample mean (x̄) of 85, a standard deviation (σ) of 8, and a sample size (n) of 64, you can use the formula for the confidence interval for the population mean when the population standard deviation is known. The formula is:

Confidence Interval = x̄ ± Z * (σ / √n)

Where:
- x̄ is the sample mean (85).
- Z is the critical value from the standard normal distribution corresponding to the desired confidence level. For a 95% confidence interval, Z = 1.96 (you can find this value in a standard normal distribution table).
- σ is the population standard deviation (8).
- √n is the square root of the sample size (√64 = 8).

Confidence Interval = 85 ± 1.96 * (8 / 8)

Confidence Interval = 85 ± 1.96 * 1

Confidence Interval = 85 ± 1.96

To find the confidence interval, calculate both the upper and lower bounds:

Lower Bound = 85 - 1.96 = 83.04
Upper Bound = 85 + 1.96 = 86.96

So, the 95% confidence interval estimate of the population mean μ is approximately between 83.04 and 86.96. This means we are 95% confident that the true population mean falls within this interval.

Part 3 

# Load the dataset

data <- read.csv("correlation_data.csv")

# Calculate the correlation coefficient

correlation_coefficient <- cor(data$girls, data$boys)

# Create a  scatter plot of the correlation

plot(data$girls, data$boys, xlab = "Girls' Goals", ylab = "Boys' Time on Assignments", main = "Correlation Plot")

# Print the correlation coefficient

print(paste("Correlation Coefficient (Pearson):", round(correlation_coefficient, 2)))



Comments