Learn
Hypothesis Testing with R
Dangers of Multiple T-Tests

Suppose that you own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. You want to know if the average ant sales over the past year are significantly different between the three locations.

At first, it seems that you could perform T-tests between each pair of stores.

You know that the p-value is the probability that you incorrectly reject the null hypothesis on each t-test. The more t-tests you perform, the more likely that you are to get a false positive, a Type I error.

For a p-value of 0.05, if the null hypothesis is true, then the probability of obtaining a significant result is 1 – 0.05 = 0.95. When you run another t-test, the probability of still getting a correct result is 0.95 * 0.95, or 0.9025. That means your probability of making an error is now close to 10%! This error probability only gets bigger with the more t-tests you do.

Instructions

1.

We have created samples store_a, store_b, and store_c, representing the sales at VeryAnts at locations A, B, and C, respectively. We want to see if there’s a significant difference in sales between the three locations.

Explore datasets store_a, store_b, and store_c by finding and viewing the means and standard deviations of each one. Store the means in variables called store_a_mean, store_b_mean, and store_c_mean. Store the standard deviations in variables called store_a_sd, store_b_sd, and store_c_sd.

2.

Perform a Two Sample T-test between each pair of location data.

Store the results of the tests in variables called a_b_results, a_c_results, and b_c_results. View the results for each test.

3.

Store the probability of error for running three T-Tests in a variable called error_prob. View error_prob.

Folder Icon

Take this course for free

Already have an account?