Learn
K-Means Clustering
Iris Dataset

Before we implement the K-means algorithm, let’s find a dataset. The `sklearn` package embeds some datasets and sample images. One of them is the Iris dataset.

The Iris dataset consists of measurements of sepals and petals of 3 different plant species:

• Iris setosa
• Iris versicolor
• Iris virginica The sepal is the part that encases and protects the flower when it is in the bud stage. A petal is a leaflike part that is often colorful.

From `sklearn` library, import the `datasets` module:

``from sklearn import datasets``

``iris = datasets.load_iris()``

The Iris dataset looks like:

``````[[ 5.1  3.5  1.4  0.2 ]
[ 4.9  3.   1.4  0.2 ]
[ 4.7  3.2  1.3  0.2 ]
[ 4.6  3.1  1.5  0.2 ]
. . .
[ 5.9  3.   5.1  1.8 ]]``````

We call each piece of data a sample. For example, each flower is one sample.

Each characteristic we are interested in is a feature. For example, petal length is a feature of this dataset.

The features of the dataset are:

• Column 0: Sepal length
• Column 1: Sepal width
• Column 2: Petal length
• Column 3: Petal width

The 3 species of Iris plants are what we are going to cluster later in this lesson.

### Instructions

1.

Import the `datasets` module and load the Iris data.

2.

Every dataset from `sklearn` comes with a bunch of different information (not just the data) and is stored in a similar fashion.

First, let’s take a look at the most important thing, the sample data:

``print(iris.data)``

Each row is a plant!

3.

Since the datasets in `sklearn` datasets are used for practice, they come with the answers (target values) in the `target` key:

Take a look at the target values:

``print(iris.target)``

The `iris.target` values give the ground truth for the Iris dataset. Ground truth, in this case, is the number corresponding to the flower that we are trying to learn.

4.

It is always a good idea to read the descriptions of the data:

``print(iris.DESCR)``

Expand the terminal (right panel):

• When was the Iris dataset published?
• What is the unit of measurement?