The K-Nearest Neighbor Algorithm:
We’ve now found the
k nearest neighbors, and have stored them in a list that looks like this:
[ [0.083, 'Lady Vengeance'], [0.236, 'Steamboy'], ... ... [0.331, 'Godzilla 2000'] ]
Our goal now is to count the number of good movies and bad movies in the list of neighbors. If more of the neighbors were good, then the algorithm will classify the unknown movie as good. Otherwise, it will classify it as bad.
In order to find the class of each of the labels, we’ll need to look at our
movie_labels dataset. For example,
movie_labels['Akira'] would give us
1 because Akira is classified as a good movie.
You may be wondering what happens if there’s a tie. What if
k = 8 and four neighbors were good and four neighbors were bad? There are different strategies, but one way to break the tie would be to choose the class of the closest point.
Our classify function now needs to have knowledge of the labels. Add a parameter named
classify. It should be the third parameter.
Continue writing your classify function.
Create two variables named
num_bad and set them each at
0. Use a for loop to loop through every
neighbors. Store their title in a variable called
Remember, every neighbor is a list of
[distance, title] so the title can be found at index
For now, return
title at the end of your function (outside of the loop).
title to find the label of each movie:
0, add one to
1, add one to
For now, return
num_good at the end of your function.
We can finally classify our unknown movie:
num_goodis greater than
num_bad, return a
classify using the following parameters and print the result.
[.4, .2, .9]as the movie you’re looking to classify.
movie_datasetthe training dataset.
movie_labelsas the training labels.
k = 5
Does the system predict this movie will be good or bad?