Congratulations! You just learned how a Logistic Regression model works and how to fit one to a dataset. Class is over, and the final exam for Codecademy University’s Introductory Machine Learning is around the corner. Do you predict that you will pass? Let’s do some review to make sure.

  • Logistic Regression is used to perform binary classification, predicting whether a data sample belongs to a positive (present) class, labeled 1 and the negative (absent) class, labeled 0.
  • The Sigmoid Function bounds the product of feature values and their coefficients, known as the log-odds, to the range [0,1], providing the probability of a sample belonging to the positive class.
  • A loss function measures how well a machine learning model makes predictions. The loss function of Logistic Regression is log-loss.
  • A Classification Threshold is used to determine the probabilistic cutoff for where a data sample is classified as belonging to a positive or negative class. The standard cutoff for Logistic Regression is 0.5, but the threshold can be higher or lower depending on the nature of the data and the situation.
  • Scikit-learn has a Logistic Regression implementation that allows you to fit a model to your data, find the feature coefficients, and make predictions on new data samples.
  • The coefficients determined by a Logistic Regression model can be used to interpret the relative importance of each feature in predicting the class of a data sample.


Find another dataset for binary classification from Kaggle or take a look at sklearn‘s breast cancer dataset. Use sklearn to build your own Logistic Regression model on the data and make some predictions. Which features are most important in the model you build?

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Already have an account?