While it is useful to match and search for patterns of individual characters in a text, you can often find more meaning by analyzing text on a word-by-word basis, focusing on the part of speech of each word in a sentence. This process of identifying and labeling the part of speech of words is known as part-of-speech tagging!

It may have been a while since you’ve been in English class, so let’s review the nine parts of speech with an example:

Wow! Ramona and her class are happily studying the new textbook she has on NLP.

  • Noun: the name of a person (Ramona,class), place, thing (textbook), or idea (NLP)
  • Pronoun: a word used in place of a noun (her,she)
  • Determiner: a word that introduces, or “determines”, a noun (the)
  • Verb: expresses action (studying) or being (are,has)
  • Adjective: modifies or describes a noun or pronoun (new)
  • Adverb: modifies or describes a verb, an adjective, or another adverb (happily)
  • Preposition: a word placed before a noun or pronoun to form a phrase modifying another word in the sentence (on)
  • Conjunction: a word that joins words, phrases, or clauses (and)
  • Interjection: a word used to express emotion (Wow)

You can automate the part-of-speech tagging process with nltk‘s pos_tag() function! The function takes one argument, a list of words in the order they appear in a sentence, and returns a list of tuples, where the first entry in the tuple is a word and the second is the part-of-speech tag.

Given the sentence split into a list of words below:

word_sentence = ['do', 'you', 'suppose', 'oz', 'could', 'give', 'me', 'a', 'heart', '?']

you can tag the parts of speech as follows:

part_of_speech_tagged_sentence = pos_tag(word_sentence)

The call to pos_tag() will return the following:

[('do', 'VB'), ('you', 'PRP'), ('suppose', 'VB'), ('oz', 'NNS'), ('could', 'MD'), ('give', 'VB'), ('me', 'PRP'), ('a', 'DT'), ('heart', 'NN'), ('?', '.')]

Abbreviations are given instead of the full part of speech name. Some common abbreviations include: NN for nouns, VB for verbs, RB for adverbs, JJ for adjectives, and DT for determiners. A complete list of part-of-speech tags and their abbreviations can be found here.



Provided to you in the workspace is the text of The Wonderful Wizard of Oz, broken down into individual words on a sentence by sentence basis in a process known as tokenization. These sentences are called word tokenized sentences, which are stored in word_tokenized_oz.

Save the value stored at index 100 of word_tokenized_oz to a variable named witches_fate, and print it. You should see a sentence from the novel, split into individual words, print to the terminal.


Since the text has been broken down to individual words on a sentence by sentence level, you now can part-of-speech tag each word tokenized sentence in The Wonderful Wizard of Oz! Begin by creating an empty list named pos_tagged_oz to hold the part-of-speech tagged sentences.


Create a for-loop through each word tokenized sentence in word_tokenized_oz. Within the for-loop, part-of-speech tag each word tokenized sentence and append the result to pos_tagged_oz.


Save the part-of-speech tagged sentence at index 100 to a variable named witches_fate_pos, and print it.

