Word cloud from a Pandas data frame

Word cloud from a Pandas data frame

Imagine that you have a data frame of tweets and you want to create a word cloud. You can do it using the wordcloud library. In this example the data variable is a Pandas dataframe which has a columns Tweet.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
text = data.Tweet.values
wordcloud = WordCloud(
    width = 3000,
    height = 2000,
    background_color = 'black',
    stopwords = STOPWORDS).generate(str(text))
fig = plt.figure(
    figsize = (40, 30),
    facecolor = 'k',
    edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

In real life, you should do some preprocessing and remove words which should not appear in the output plot. In case of tweets, you may need to remove not only the stopwords but also URLs and nicknames.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


If this article was helpful, consider donating to WWF or any other charity of your choice.
Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group