Key Concepts

Review core concepts you need to learn to master this subject

Information Gain at decision trees

When making decision trees, two different methods are used to find the best feature to split a dataset on: Gini impurity and Information Gain. An intuitive interpretation of Information Gain is that it is a measure of how much information the individual features provide us about the different classes.

Decision Trees
Lesson 1 of 2
  1. 1
    Decision trees are machine learning models that try to find patterns in the features of data points. Take a look at the tree on this page. This tree tries to predict whether a student will get an A…
  2. 2
    If we’re given this magic tree, it seems relatively easy to make classifications. But how do these trees get created in the first place? Decision trees are supervised machine learning models, which…
  3. 3
    In this lesson, we’ll create a decision tree build off of a dataset about cars. When considering buying a car, what factors go into making that decision? Each car can fall into four different cla…
  4. 4
    Consider the two trees below. Which tree would be more useful as a model that tries to predict whether someone would get an A in a class? Let’s say you use the top tree. You’ll end up at a l…
  5. 5
    We know that we want to end up with leaves with a low Gini Impurity, but we still need to figure out which features to split on in order to achieve this. For example, is it better if we split our d…
  6. 6
    We’re not quite done calculating the information gain of a set of objects. The sizes of the subset that get created after the split are important too! For example, the image below shows two sets wi…
  7. 7
    Now that we can find the best feature to split the dataset, we can repeat this process again and again to create the full tree. This is a recursive algorithm! We start with every data point from th…
  8. 8
    We can finally use our tree as a classifier! Given a new data point, we start at the top of the tree and follow the path of the tree until we hit a leaf. Once we get to a leaf, we’ll use the classe…
  9. 9
    Nice work! You’ve written a decision tree from scratch that is able to classify new points. Let’s take a look at how the Python library scikit-learn implements decision trees. The sklearn.tree mod…
  10. 10
    Now that we have an understanding of how decision trees are created and used, let’s talk about some of their limitations. One problem with the way we’re currently making our decision trees is that…
  11. 11
    Great work! In this lesson, you learned how to create decision trees and use them to make classifications. Here are some of the major takeaways: * Good decision trees have pure leaves. A leaf is p…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo