K-Means Clustering
Lesson 1 of 2
1. 1
Often, the data you encounter in the real world won’t have flags attached and won’t provide labeled answers to your question. Finding patterns in this type of data, unlabeled data, is a common them…
2. 2
The goal of clustering is to separate data so that data similar to one another are in the same group, while data different from one another are in different groups. So two questions arise: - How m…
3. 3
Before we implement the K-means algorithm, let’s find a dataset. The sklearn package embeds some datasets and sample images. One of them is the Iris dataset . The Iris dataset consists of measure…
4. 4
To get a better sense of the data in the iris.data matrix, let’s visualize it! With Matplotlib, we can create a 2D scatter plot of the Iris dataset using two of its features (sepal length vs. peta…
5. 5
The K-Means algorithm: 1. Place k random centroids for the initial clusters. 2. Assign data samples to the nearest centroid. 3. Update centroids based on the above-assigned data samples. Repe…
6. 6
The K-Means algorithm: 1. Place k random centroids for the initial clusters. 2. Assign data samples to the nearest centroid. 3. Update centroids based on the above-assigned data samples. Repe…
7. 7
The K-Means algorithm: 1. Place k random centroids for the initial clusters. 2. Assign data samples to the nearest centroid. 3. Update centroids based on the above-assigned data samples. Repe…
8. 8
The K-Means algorithm: 1. Place k random centroids for the initial clusters. 2. Assign data samples to the nearest centroid. 3. Update centroids based on the above-assigned data samples. **Repeat…
9. 9
Awesome, you have implemented K-Means clustering from scratch! Writing an algorithm whenever you need it can be very time-consuming and you might make mistakes and typos along the way. We will no…
10. 10
You used K-Means and found three clusters of the samples data. But it gets cooler! Since you have created a model that computed K-Means clustering, you can now feed new data samples into it and …
11. 11
We have done the following using sklearn library: - Load the embedded dataset - Compute K-Means on the dataset (where k is 3) - Predict the labels of the data samples And the labels resulted in e…
12. 12
At this point, we have clustered the Iris data into 3 different groups (implemented using Python and using scikit-learn). But do the clusters correspond to the actual species? Let’s find out! Firs…
13. 13
At this point, we have grouped the Iris plants into 3 clusters. But suppose we didn’t know there are three species of Iris in the dataset, what is the best number of clusters? And how do we determi…
14. 14
Now it is your turn! In this review section, find another dataset from one of the following: - The scikit-learn library - UCI Machine Learning Repo - Codecademy GitHub Repo (coming soon!) …
1. 1
The K-Means clustering algorithm is more than half a century old, but it is not falling out of fashion; it is still the most popular clustering algorithm for Machine Learning. However, there can b…
2. 2
Suppose we have four data samples that form a rectangle whose width is greater than its height: If you wanted to find two clusters (k = 2) in the data, which points would you cluster together? …
3. 3
To recap, the Step 1 of the K-Means algorithm is “Place k random centroids for the initial clusters”. The K-Means++ algorithm replaces Step 1 of the K-Means algorithm and adds the following: - **…
4. 4
Using the scikit-learn library and its cluster module , you can use the KMeans() method to build an original K-Means model that finds 6 clusters like so: model = KMeans(n_clusters=6, init=’rando…
5. 5
Congratulations, now your K-Means model is improved and ready to go! K-Means++ improves K-Means by placing initial centroids more strategically. As a result, it can result in more optimal clusteri…

## What you'll create

Portfolio projects that showcase your new skills ## How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory 