Word cloud from a Pandas data frame

Imagine that you have a data frame of tweets and you want to create a word cloud. You can do it using the wordcloud library. In this example the data variable is a Pandas dataframe which has a columns Tweet.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
text = data.Tweet.values
wordcloud = WordCloud(
    width = 3000,
    height = 2000,
    background_color = 'black',
    stopwords = STOPWORDS).generate(str(text))
fig = plt.figure(
    figsize = (40, 30),
    facecolor = 'k',
    edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

In real life, you should do some preprocessing and remove words which should not appear in the output plot. In case of tweets, you may need to remove not only the stopwords but also URLs and nicknames.

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.