• Son Luu

Word Vector in Python

Updated: Apr 19, 2019

Class: Reading and Writing Electronic Text

Instructor: Allison Parrish

Last week, generative text with Markov chain was fun and cool, but the problem with the method is that it doesn't take into account the context of the words. As a result, depending on the source text, certain parts of the new text generated may not make the most logical sense. This week's lesson talked about how to manipulate generative text in a more context-based environment. In order to do that, the concept of Word Vector was introduced, using in combination with different libraries such as spaCy, PronouncingPy, and interesting files such as color file xkcd.json etc.


(1) Use proximity in vector space as the basis for a creative composition. You can use spaCy’s built-in word vectors or some other vectors (e.g., the xkcd color vectors or my phonetic similarity vectors.

(2) Use the CMU Pronouncing Dictionary (or some other system for getting phonetic information about words) to rework a previous homework assignment with an eye toward phonetic cohesion and symbolism.


I was speaking to my friend Anna about this assignment. She said we was playing with generative text for baking recipes. The baking recipes generated from Python, actually sounded hilarious as the original words were replaced with others with similar meanings. That idea gave me an inspiration: Drink recipes.

This time of the year, as summer is near, thinking about drink recipes makes me think of summer cocktails. Not only are there so many recipes, they can also get pretty colorful, especially those that are more tropical-weather inspired. So, that got me to wonder what is the color theme(s) of these summer drinks.

What if I average out the colors identified in these drink recipes, and then find the color(s) closest to that average. I wonder what that color turns out to be.


In order to do this, I first upload the color json file, convert the color names to RGB vectors.

Then, I used SimpleNeighbors - a library that allows the process of identifying text by its vector, and enables finding text that is close to one another based on their vector position.


- upload a text file that contains 70 summer cocktail recipes

- find words that describe colors in this text file

- make these words lower case

- then, parse these color names into a list

- do an average calculation of the vector positions of all the color names

- then, find a list of colors nearest to that average RGB color code


These are some of the color themes found:

I traced back the RGB codes to find these colors and they're pretty much very similar and can be summed up by these two shades of light brown and light olive green.

When looking at the red, green and blue components of these 2 colors, one can easily see:

  • Light brown shade: has more Red

  • Light olive green: has more Green

Both are shades of warm colors, indeed shades of summer. 🍹