R Tutorial : Displaying results with ggplot

preview_player
Показать описание

---

In this chapter, we will review how to display results of topic modeling using ggplot.

As we work with LDA topic modeling, we will be interested in displaying two kinds of data. First, there are word counts in documents - we obtain them before we fit a topic model. Then, there are probabilities of topics in documents and words in topics. These are obtained after fitting a topic model. Fortunately, ggplot can do it all. ggplot requires the data to be in tidy format. Fortunately, word counts are already in tidy format, and LDA results can be converted into a tidy format using function tidy() from package tidytext.

When we fit a topic model, we call function LDA() and it returns an object. Among many things, this object contains two matrices: beta and gamma. beta contains logarithms of probabilities of words belonging to topics, and gamma - probabilities of documents belonging to topics.

Notice how in the output, the dimension of beta is 2 by 34: two topics and 34 words. The dimension of gamma is 5 by 2: five documents and two topics.

Function tidy() takes an LDA model object and returns a tidy table with a specified matrix.

The geometry column layer in ggplot will produce a column chart. By default, the columns will be stacked. In the call to ggplot(), the aesthetics specifies that values for axis x will come from column "document", for axis y - from column "gamma".

Dodged, or side-by-side, columns are better for telling which column is taller.To make a chart with dodged columns, we need to add position_dodge() argument to the call of geom_col() The example shows probabilities of words. The data comes from matrix "beta" contained in LDA model. The word is contained in column "term".

Let's do a few examples.

#DataCamp #RTutorial #TopicModelinginR
Рекомендации по теме
visit shbcf.ru