Multiple Linear Regression
Rebuild the Model

Now let’s rebuild the model using the new features as well as evaluate the new model to see if we improved!

For Manhattan, the scores returned:

Train score: 0.772546055982 Test score: 0.805037197536

For Brooklyn, the scores returned:

Train score: 0.613221453798 Test score: 0.584349923873

For Queens, the scores returned:

Train score: 0.665836031009 Test score: 0.665170319781

For whichever borough you used, let’s see if we can improve these scores!



Print the coefficients again to see which ones are strongest.


Currently the x should look something like:

x = df[['bedrooms', 'bathrooms', 'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym']]

Remove some of the features that don’t have strong correlations and see if your scores improved!

Post your best model in the Slack channel!

Folder Icon

Sign up to start coding

Already have an account?