Word cloud from a Pandas data frame
Imagine that you have a data frame of tweets and you want to create a word cloud. You can do it using the wordcloud library. In this example the
data variable is a Pandas dataframe which has a columns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt text = data.Tweet.values wordcloud = WordCloud( width = 3000, height = 2000, background_color = 'black', stopwords = STOPWORDS).generate(str(text)) fig = plt.figure( figsize = (40, 30), facecolor = 'k', edgecolor = 'k') plt.imshow(wordcloud, interpolation = 'bilinear') plt.axis('off') plt.tight_layout(pad=0) plt.show()
In real life, you should do some preprocessing and remove words which should not appear in the output plot. In case of tweets, you may need to remove not only the stopwords but also URLs and nicknames.
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like