Learn
K-Means Clustering
Implementing K-Means: Step 2

The K-Means algorithm:

  1. Place k random centroids for the initial clusters.
  2. Assign data samples to the nearest centroid.
  3. Update centroids based on the above-assigned data samples.

Repeat Steps 2 and 3 until convergence.


In this exercise, we will implement Step 2.

Now we have the three random centroids. Let’s assign data points to their nearest centroids.

To do this we’re going to use a Distance Formula to write a distance() function. Then, we are going to iterate through our data samples and compute the distance from each data point to each of the 3 centroids.

Suppose we have a point and a list of three distances in distances and it looks like [15, 20, 5], then we would want to assign the data point to the 3rd centroid. The argmin(distances) would return the index of the lowest corresponding distance, 2, because the index 2 contains the minimum value.

Instructions

1.

Write a distance() function.

It should be able to take in a and b and return the distance between the two points.

2.

Create an array called labels that will hold the cluster labels for each data point. Its size should be the length of the data sample.

It should look something like:

[ 0. 0. 0. 0. 0. 0. ... 0.]

Create an array called distances that will hold the distances for each centroid. It should have the size of k.

It should look something like:

[ 0. 0. 0.]
3.

To assign each data point to the closest centroid, we need to iterate through the whole data sample and calculate each data point’s distance to each centroid.

We can get the index of the smallest distance of distances by doing:

cluster = np.argmin(distances)

Then, assign the cluster to each index of the labels array.

4.

Then, print labels (outside of the for loop).

Awesome! You have just finished Step 2 of the K-means algorithm.

Folder Icon

Sign up to start coding

Already have an account?