Well done! You’ve calculated the variance of a data set. The full equation for the variance is as follows:

`$\sigma^2 = \frac{\sum_{i=1}^{N}{(X_i -\mu)^2}}{N}$`

Let’s dissect this equation a bit.

- Variance is usually represented by the symbol sigma squared.
- We start by taking every point in the dataset — from point number
`1`

to point number`N`

— and finding the difference between that point and the mean. - Next, we square each difference to make all differences positive.
- Finally, we average those squared differences by adding them together and dividing by
`N`

, the total number of points in the dataset.

All of this work can be done quickly using Python’s NumPy library. The `var()`

function takes a list of numbers as a parameter and returns the variance of that dataset.

import numpy as np dataset = [3, 5, -2, 49, 10] variance = np.var(dataset)

### Instructions

**1.**

We’ve imported the same two datasets from the beginning of the lesson. Run the code to see a histogram of the two datasets. This time, the histograms are plotted on the same graph to help visualize the difference in spread.

Which dataset do you expect to have a larger variance?

**2.**

Scroll down in the code to find where we’ve definied `teacher_one_variance`

and `teacher_two_variance`

. Set those variables equal to the variance of each dataset using the `np.var()`

function.

# Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.