Learn
Word Embeddings
Review
Lost in a multidimensional vector space after this lesson? We hope not! We have covered a lot here, so let’s take some time to recap.
- Vectors are containers of information, and they can have anywhere from 1-dimension to hundreds or thousands of dimensions
- Word embeddings are vector representations of a word, where words with similar contexts are represented with vectors that are closer together
- spaCy is a package that enables us to view and use pre-trained word embedding models
- The distance between vectors can be calculated in many ways, and the best way for measuring the distance between higher dimensional vectors is cosine distance
- Word2Vec is a shallow neural network model that can build word embeddings using either continuous bag-of-words or continuous skip-grams
- Gensim is a package that allows us to create and train word embedding models using any corpus of text
Instructions
1.
Load a word embedding model from spaCy into a variable named nlp
.
2.
Use the loaded model to create the following words embeddings:
- a vector representation of the word “sponge” saved in a variable named
sponge_vec
- a vector representation of the word “starfish” in a variable named
starfish_vec
- a vector representation of the word “squid” in a variable named
squid_vec
3.
Use SciPy to compute the cosine distance between:
sponge_vec
andstarfish_vec
, storing the result in a variabledist_sponge_star
sponge_vec
andsquid_vec
, storing the result in a variabledist_sponge_squid
starfish_vec
andsquid_vec
, storing the result in a variabledist_star_squid
Print dist_sponge_star
, dist_sponge_squid
and dist_star_squid
to the terminal.
Which word embeddings are furthest apart according to cosine distance?