Bag-of-Words Language Model
It's All in the Bag

Phew! That was a lot of work.

It’s time to put create_features_dictionary() and tokens_to_bow_vector() together and use them in a spam filter we created that uses a Naive Bayes classifier. We’ve slightly modified the two functions for this use case, but they should still look familiar.

Let’s see create_features_dictionary() and tokens_to_bow_vector() in action with real test data, helping fend off spam!



Below tokens_to_bow_vector(), call create_features_dictionary() on training_doc_tokens and assign the result to bow_sms_dictionary.


Define training_vectors as a list comprehension. The list comprehension should call tokens_to_bow_vector() on training_doc and bow_sms_dictionary for each training_doc in training_spam_docs.


Define test_vectors as a list comprehension that calls tokens_to_bow_vector() on test_doc and bow_sms_dictionary for each test_doc in test_spam_docs.


Ready? Get set! Uncomment the code at the bottom of script.py and run the code again!

The Naive Bayes classifier was pretty darn accurate in determining which messages were spam by using your bag-of-words functions!

Folder Icon

Sign up to start coding

Already have an account?