Learn
Hypothesis Testing with R
Sample Mean and Population Mean - I

Suppose you want to know the average height of an oak tree in your local park. On Monday, you measure 10 trees and get an average height of 32 ft. On Tuesday, you measure 12 different trees and reach an average height of 35 ft. On Wednesday, you measure the remaining 11 trees in the park, whose average height is 31 ft. The average height for all 33 trees in your local park is 32.8 ft.

The collection of individual height measurements on Monday, Tuesday, and Wednesday are each called samples. A sample is a subset of the entire population (all the oak trees in the park). The mean of each sample is a sample mean and it is an estimate of the population mean.

Note: the sample means (32 ft., 35 ft., and 31 ft.) were all close to the population mean (32.8 ft.), but were all slightly different from the population mean and from each other.

For a population, the mean is a constant value no matter how many times it’s recalculated. But with a set of samples, the mean will depend on exactly which samples are selected. From a sample mean, we can then extrapolate the mean of the population as a whole. There are three main reasons we might use sampling:

  • data on the entire population is not available
  • data on the entire population is available, but it is so large that it is unfeasible to analyze
  • meaningful answers to questions can be found faster with sampling

Instructions

1.

In the workspace, we’ve generated a random population of size 300 that follows a normal distribution with a mean of 65. Update the value of population_mean to store the mean() of population. Does it closely match your expectation?

2.

Let’s look at how the means of different samples can vary within the same population.

The code in the notebook generates 5 random samples from population. sample_1 is displayed and sample_1_mean has been calculated.

Replace the "Not calculated" strings with calculations of the means for sample_2, sample_3, sample_4, and sample_5.

Look at the population mean and the sample means. Are they all the same? All different? Why?

Folder Icon

Take this course for free

Already have an account?