What's Data Got To Do With It?

You may have heard the term "big data" thrown around, but do you understand what it is? Big data refers to massive datasets that can be used to reveal patterns and trends. Any large quantity of information, from sports scores to social media posts, can be considered big data. And since big data is so, well, big, we need optimized algorithms and high-powered computers to sift through it.

Advances in data science have changed the way we communicate, share, and receive information. Consider how big data has changed our TV and movie experiences. Companies like Netflix collect thousands of data points from several places to make suggestions to users with the help of a tool known as a recommender engine. Gone are the days of browsing the shelves in a Blockbuster on a Friday night (that is, if you even were alive when Blockbusters were around).

But how did the Netflix engineering team build a recommender engine? Netflix uses machine learning, a subset of artificial intelligence, to help their algorithms "learn" without human assistance. Machine learning gives the platform the ability to automate millions of decisions based off of user activities. When Netflix recommends The Office because I like Parks and Recreation, machine learning was behind that decision. This suggestion is the Netflix recommendation engine at work: it uses your past activity and returns movies and shows it thinks you will enjoy.

The need for recommendation engines and personalization is a result of a phenomenon known as the “era of abundance”. We have a huge variety of choices because of how much is available through the Internet. In this article, we'll learn how data science has enhanced our ability to choose and, frankly, our Saturday night binging options. Let’s take a deep dive into the Netflix recommendation system.

How Netflix Slays the Recommendation Game

Let's not date ourselves, but some may remember a time when we frequented video rental stores. These stores were a hit! That is, until the market was tired of limited selections and other physical constrictions.

Today, online platforms like Netflix offer thousands of movies and shows. However, this much choice can be overwhelming for users! With over 7,000 movies and shows in the Netflix catalog, it is nearly impossible for users to find movies they'll like on their own. The large platform needs a recommendation engine algorithm to automate the search process for users.

There are multiple potential methods for creating a recommendation engine. The method you choose simply depends on the size of the user base, the size of the catalog, and the goals of the platform. A basic implementation of a recommendation engine would be the editorial method. In the editorial method, theplatform would make recommendations based on a relatively small amount of individuals. Another easy one is the aptly named simple collection method where the platform makes suggestions based on the top products across the platform.

Netflix doesn’t use those recommendation methods because they don’t allow for personalization, or cover the breadth of the movie catalogs and user preferences. Instead, Netflix uses the personalized method where movies are suggested to the users who are most likely to enjoy them based on a metric like major actors or genre. Machine learning is necessary for this method because it uses user data to make informed suggestions. This way Netflix methodology accounts for the diversity in its audiences and its very large catalog.

Netflix is All Probability

Machine learning is able to create "smart" platforms because it uses probability to discover the likelihood of a user liking a product. To understand the probability aspect of recommendation engines, let’s look at an example of a utility matrix, a probability model which places a score on the relationship between a user and a movie type in order to predict their preferences.

utility matrix example

In this example, we have three Netflix users: Fatima, Maya, and Leslie. Each user has watched a few movies on Netflix and rated them and each user has movies in the catalog they haven't watched or rated yet. The user ratings are utility scores which represent the relationship between the movie and the user. In this example, our utility scores are represented by the checkmark and X symbols. The checkmarks represent movies that they've seen and liked and the Xs represent movies they've seen and not liked. The question marks represent movies that they haven't seen yet.

Given this utility matrix, what do you think the recommendation engine will suggest? As we've learned, the job of the recommendation engine is to predict the probability of a user liking a movie based on the previous movies that they've seen and liked. So, Netflix may suggest Fatima watch a who-done-it since she enjoyed the thrilling plot of a horror movie she watched last week. The engine may recommend more light-hearted films to Maya, like a comedy or romance, because she didn’t enjoy the horror film. Because Leslie equally enjoyed a horror movie and a romance movie, Netflix will give her a more niche, curated experience, like romantic thrillers. Now each user will receive a suggestion that's personalized for them!


Machine learning algorithms are designed to work diligently to provide you with everything you want, and some things you don't even know you want yet. At Netflix, they even go beyond recommending movie titles and use recommendation engines to curate the preview images you're seeing on your feed. If you rated Johnny Depp movies highly, you're likely to see an image of Captain Jack Sparrow when Netflix recommends Pirates of the Caribbean to you. If you tend to like romance movies, your feed will likely have preview images of scenes with actors embracing one another. Machine learning algorithms are designed to work diligently to provide you with everything you want, and even some things you don't know you want yet.

Machine learning techniques at Netflix have completely disrupted the way television and movie industries operate. We have data science to thank for having personalized experiences like Netflix. The field is growing rapidly in this era of abundance so we have guidance when sorting through thousands of options to find the perfect products.

Data science in our daily lives gives us access to educated choices and more curated experiences. The computer power and know-how of data scientist results in seamlessly accurate decision-making. Think about that during your binge fest this weekend!

Made in NYC © 2018 Codecademy