Python Tutorial: Let's build a word cloud!

preview_player
Показать описание

---
Welcome back! Chances are high that you have seen a word cloud before.

A word cloud is an image composed of words with different sizes and colors. They can be especially useful in sentiment analysis. Have you ever wondered how such an image is generated? In this video, we will learn how to create a word cloud in Python.

Word clouds (also called tag clouds) are used across different contexts. In the most common type of word clouds - and the one we will be using in this course - the size of the text corresponds to the frequency of the word. The more frequent a word is, the bigger and bolder it will appear on the word cloud.

Remember how we found the longest movie review?

This word cloud is generated using only the words in one of the longest reviews. Which movie do you guess the review is talking about? I think we can agree it is about the Titanic!

Why are word clouds so popular?

First of all, they can reveal the essential. We saw in our word cloud, the word Titanic really popped out.

Second, unless told otherwise, they will plot all the words in a text, and a quick scan of the image can provide an overall sense of the text.

Last but not least, they are easy to understand and quite fun.

However, they have their drawbacks. Sometimes they tend to work less well. All the words plotted on the cloud might seem unrelated and it could be difficult to draw a conclusion based on a crowded word cloud.

Secondly, if the text we work with is large, a word cloud might require quite a lot of preprocessing steps before it appears sensible and uncluttered.

Now let's create a word cloud in Python.To do that, we can use the WordCloud function from the wordcloud package.

Let's define a string, called two_cities, which captures the first sentence of Dickens' Tale of Two Cities. Note how the text carries many emotionally charged words.

After we have imported the package, we build the cloud by calling the WordCloud function, followed by the generate method, which takes as argument the text, in our case - the two_cities string.

The WordCloud function has many arguments. We will not cover all of them here but if you want to learn what they are, just type ?WordCloud in the Shell. You can change things such as the background color, the size and font of the words, their scaling and others. One interesting argument you can specify is the stopwords, which will remove words such as 'the', 'and', 'to', 'from', and so on. We will cover what stopwords are in detail in a later video.

The result cloud_two_cities is a wordcloud object.

We specify we don't want the image to display x and y axis, and finally, call the show method. The imshow() function has created the figure but we need to call show() to display it.

We see the word cloud we generated on this piece of text. Which words pop out the most?

Let's practice building different word clouds in the exercises!

#PythonTutorial #DataCamp #Sentiment #Analysis #Python #wordcloud
Рекомендации по теме