Create wordcloud portraits with python

This is not a data viz article but an article about creating graphically-interesting art using the wordcloud library in python.

Jelena Ristic
4 min readApr 9, 2022
Wordcloud generated from Darth Vader’s lines in Star Wars scripts (Jelena Ristic, 2022)

Data visualisations are a type of storytelling leading to insight and for the best of them to enlightenment that can drive decision making. Wordclouds are not one of them. They are actually quite the opposite as they are graphically tied to a font and font size cannot be accurately translated to a value. Also, longer words with a smaller size take more space than a slightly bigger but shorter word, leading our human eye to misinterpret what it sees.

However, even if I find them useless for actual data visualisation that should drive decision-making, I find them quite interesting when it comes to graphical art and design. I had the idea of creating portraits of well-known characters or people and using their own words to draw their portraits. To do so, I used python and the wordcloud library.

The example of Virginia Woolf and The Room of One’s Own

First of all, you have to install the wordcloud library via pip:

pip install wordcloud

I used a Jupyter Notebook to generate it in stages.

Once your Jupyter Notebook is up and running, import the necessary libraries:

from wordcloud import WordCloud, STOPWORDS, ImageColorGeneratorimport matplotlibimport matplotlib.pyplot as pltplt.rcParams["figure.dpi"] = 2000import pandas as pdimport numpy as npfrom PIL import Image

The next step is to read the data, i.e. feed python the words it’ll use to generate the cloud (you’ll use pandas to read your data source, whatever it is: an html online file, a .txt file, etc.):

#if you want to use an url as the source: 
#data = pd.read_html(input("paste your URL here: "))
#in my case, I had the Room of One's Own in a txt format, scraped
#from project Gutenberg website.
data = open("woolf_room.txt").read()
#you want to make sure it's all string
data = str(data)
#then you can generate the first wordcloud to check it's all okwordcloud = WordCloud().generate(data)#and use matplotlib to display generated imageplt.imshow(wordcloud)plt.axis("off")plt.show()

At this stage, you will get the generic wordcloud output in staple python colours and font.

You can now start customising it: chose the relevant mask you will apply to model the final wordcloud. It is best to chose one that has a good amount of contrast to get the best out of your wordcloud, also the mask picture size is crucial if you want to get an output that you can blow up significantly without too much pixelation. The bigger the picture, the better the cloud.

For the Virginia Woolf wordcloud, I used her famous portrait photographed by George Charles Beresford in 1902:

George Charles Beresford, Virginia Woolf, 1902 (Wikimedia Commons)

To set up the picture as the cloud mask, use numpy and the following code:

cloud_mask = np.array(Image.open("FILE NAME/PATH OF YOUR IMAGE HERE.JPG/PNG"))plt.imshow(cloud_mask)plt.axis("off")#I like to make sure each stage works before proceeding furtherplt.show()

Opt for the font of your liking (make sure it’s in the same folder as your project or that you refer to it with the correct path) — I like using fonts that graphically add meaning to the final output. Here I used a classical serif font given the literary nature of the text.

wordcloud = WordCloud(font_path="Sitka.ttc", #add your font here
width = 2380, # use the same width and height
#as your mask size for best
#results
height = 2879,
stopwords=STOPWORDS, #if the staple stopwords
#the module provides is
#not good enough, you can
#list your own
background_color = "white",
mask = cloud_mask,
contour_width = 0,
repeat=True,
min_font_size = 2, #if you want a tight fit,
#go for a small figure here
max_words = 1000, #set max words count
).generate(data)
image_colors = ImageColorGenerator(cloud_mask) #use the mask colourswordcloud.generate(data)wordcloud.recolor(color_func=image_colors)plt.imshow(wordcloud)
plt.axis("off")
fig1=plt.gcf()
plt.show()
plt.draw()
fig1.savefig("Woolf.svg", format="svg", bbox_inches="tight")

I like saving the output in SVG format, as I then use it in Adobe InDesign to compose the final product.

Here is the output of the code above:

“Woolfcloud” generated by combining her photographic portrait and the most recurring words in her text “The Room of One’s Own” (Jelena Ristic, 2022)

You can download the A3 poster pdf of my “Woolfcloud” here.

I hope you enjoyed this article — feel free to leave a comment if you made one of your own!

Many thanks for reading!

Would you like to support me and other writers on Medium?

To get access to unlimited stories, you can also consider signing up to become a Medium member for just 5$. If you sign up using my link, I’ll receive a small commission at no extra cost to you. Thank you!

Jelena

--

--

Jelena Ristic

Creative t(h)inker, former museum curator, and python enthusiast delving into the wondrous realm of digital humanities. Reach me at hello@jelenaristic.info