Key Concepts

Review core concepts you need to learn to master this subject

Information Gain at decision trees

When making decision trees, two different methods are used to find the best feature to split a dataset on: Gini impurity and Information Gain. An intuitive interpretation of Information Gain is that it is a measure of how much information the individual features provide us about the different classes.

Gini impurity

When making decision trees, calculating the Gini impurity of a set of data helps determine which feature best splits the data. If a set of data has all of the same labels, the Gini impurity of that set is 0. The set is considered pure. Gini impurity is a statistical measure - the idea behind its definition is to calculate how accurate it would be to assign labels at random, considering the distribution of actual labels in that subset.

Decision trees leaf creation

When making a decision tree, a leaf node is created when no features result in any information gain. Scikit-Learn implementation of decision trees allows us to modify the minimum information gain required to split a node. If this threshold is not reached, the node becomes a leaf.

Optimal decision trees

Creating an optimal decision tree is a difficult task. For example, the greedy approach of splitting a tree based on the feature that results in the best current information gain doesn’t guarantee an optimal tree. There are numerous heuristics to create optimal decision trees, and each of these methods proposes a unique way to build the tree.

Decision Tree Representation

In a decision tree, leaves represent class labels, internal nodes represent a single feature, and the edges of the tree represent possible values of those features.

Unlike other classifiers, this visual structure gives us great insight about the algorithm performance.

Decision trees pruning

Decision trees can be overly complex which can result in overfitting. A technique called pruning can be used to decrease the size of the tree to generalize it to increase accuracy on a test set. Pruning is not an exact method, as it is not clear which should be the ideal size of the tree. This technique can be made bottom-up (starting at the leaves) or up-bottom (starting at the root).

Decision Trees Construction

Decision Trees are usually constructed from top to bottom. At each level of the tree, the feature that best splits the training set labels is selected as the “question” of that level. Two different criteria are available to split a node, Gini Index and Information Gain. The convenience of one or the other depends on the problem.

Random Forest definition

A Random Forest Classifier is an ensemble machine learning model that uses multiple unique decision trees to classify unlabeled data. If compared to an individual decision tree, Random Forest is a more robust classifier but its interpretability is reduced.

Random Forest overfitting

Random Forests are used to avoid overfitting. By aggregating the classification of multiple trees, having overfitted trees in the random forest is less impactful. Reduced overfitting translates to greater generalization capacity, which increases classification accuracy on new unseen data.

Random Forest feature consideration

When creating a decision tree in a random forest, a random subset of features are considered as the best feature to split the data on. By splitting the data in a random subset of features, all estimators are trained considering different aspects of the data, which reduces the probability of overfitting.

Random Forest aggregative performance

A random forest classifier makes its classification by taking an aggregate of the classifications from all the trees in the random forest. For classification, this aggregate is a majority vote. For regression, this could be the average of the trees in the random forest. This aggregation allows the classifier to capture complex non-linear relations from the data. The model performance is far superior than a linear model.

Bagging at Random Forest

Trees in a random forest classifier are created by using a random subset of the original dataset with replacement. This process is known as bagging. Bagging prevents overfitting, given that each individual tree is trained on a subset of original data.

Chevron Left Icon
Decision Trees
Lesson 1 of 2
Chevron Right Icon
  1. 1
    Decision trees are machine learning models that try to find patterns in the features of data points. Take a look at the tree on this page. This tree tries to predict whether a student will get an A…
  2. 2
    If we’re given this magic tree, it seems relatively easy to make classifications. But how do these trees get created in the first place? Decision trees are supervised machine learning models, which…
  3. 3
    In this lesson, we’ll create a decision tree build off of a dataset about cars. When considering buying a car, what factors go into making that decision? Each car can fall into four different cla…
  4. 4
    Consider the two trees below. Which tree would be more useful as a model that tries to predict whether someone would get an A in a class? Let’s say you use the top tree. You’ll end up at a l…
  5. 5
    We know that we want to end up with leaves with a low Gini Impurity, but we still need to figure out which features to split on in order to achieve this. For example, is it better if we split our d…
  6. 6
    We’re not quite done calculating the information gain of a set of objects. The sizes of the subset that get created after the split are important too! For example, the image below shows two sets wi…
  7. 7
    Now that we can find the best feature to split the dataset, we can repeat this process again and again to create the full tree. This is a recursive algorithm! We start with every data point from th…
  8. 8
    We can finally use our tree as a classifier! Given a new data point, we start at the top of the tree and follow the path of the tree until we hit a leaf. Once we get to a leaf, we’ll use the classe…
  9. 9
    Nice work! You’ve written a decision tree from scratch that is able to classify new points. Let’s take a look at how the Python library scikit-learn implements decision trees. The sklearn.tree mod…
  10. 10
    Now that we have an understanding of how decision trees are created and used, let’s talk about some of their limitations. One problem with the way we’re currently making our decision trees is that…
  11. 11
    Great work! In this lesson, you learned how to create decision trees and use them to make classifications. Here are some of the major takeaways: * Good decision trees have pure leaves. A leaf is p…
  1. 1
    We’ve seen that decision trees can be powerful supervised machine learning models. However, they’re not without their weaknesses — decision trees are often prone to overfitting. We’ve discus…
  2. 2
    You might be wondering how the trees in the random forest get created. After all, right now, our algorithm for creating a decision tree is deterministic — given a training set, the same tree …
  3. 3
    We’re now making trees based on different random subsets of our initial dataset. But we can continue to add variety to the ways our trees are created by changing the features that we use. Recall …
  4. 4
    Now that we can make different decision trees, it’s time to plant a whole forest! Let’s say we make different 8 trees using bagging and feature bagging. We can now take a new unlabeled point, give …
  5. 5
    We’re now able to create a random forest, but how accurate is it compared to a single decision tree? To answer this question we’ve split our data into a training set and test set. By building our m…
  6. 6
    You now have the ability to make a random forest using your own decision trees. However, scikit-learn has a RandomForestClassifier class that will do all of this work for you! RandomForestClassifie…
  7. 7
    Nice work! Here are some of the major takeaways about random forests: * A random forest is an ensemble machine learning model. It makes a classification by aggregating the classifications of many d…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo