Python Tutorial: Sentiment Analysis in Python | Intro

preview_player
Показать описание

---
Welcome to the course! In this course, we will build upon some of your Python skills and introduce methods for sentiment analysis using movie and product reviews, Twitter data and a lot of literary examples.

Let's start with defining what sentiment analysis is.

Sentiment analysis, also called opinion mining, is the process of understanding the opinion of an author about a subject. In other words, "What is the emotion or opinion of the author of the text about the subject discussed?"

In a sentiment analysis system, depending on the context, we usually have 3 elements:

First is the opinion or an emotion. An opinion (also called "polarity") can be positive, neutral or negative. An emotion could be qualitative (like joy, surprise, or anger) or quantitative (like rating a movie on the scale from 1 to 10).

The second element in a sentiment analysis system is the subject that is being talked about, such as a book, a movie, or a product. Sometimes one opinion could discuss multiple aspects of the same subject. For example: "The camera on this phone is great but its battery life is rather disappointing.""

The third element is the opinion holder, or entity, expressing the opinion. Sentiment analysis has many practical applications. In social media monitoring, we don't just want to know if people are talking about a brand; we want to know how they are talking about it.

Social media isn't our only source of information; we can also find sentiment on forums, blogs, and the news. Most brands analyze all of these sources to enrich their understanding of how customers interact with their brand, what they are happy or unhappy about, and what matters most to consumers.

Sentiment analysis is thus very important in brand monitoring, and in fields such as customer and product analytics and market research and analysis.

Let's look at the first dataset we will use in this course: a sample of IMDB movie reviews. We have two columns: one for the text of the review, and a second one called "label", which expresses the overall sentiment: the category or class 1 means positive and 0 means negative.

Let's find out how many positive and negative reviews we have in the data.

To do this, we call the .value_counts() method on the "label" column.

The output is the number of negative reviews (the 0 class) and positive reviews (the class 1).

If we want to see the number of positives and negatives as a percentage, we can divide the expression by the number of rows, which we obtain with the len() method.

We see that the sample is rather balanced: around half of the reviews are positive and half are negative.

How long is the longest review?

To find the length of the longest review, we need to call the max() function on the length_reviews Series.

To find the shortest review, we call the min() function on the length_reviews Series, instead of the max() function.

Let’s practice what we’ve learned in the exercises!

#PythonTutorial #DataCamp #Sentiment #Analysis #Python
Рекомендации по теме