filmov
tv
Python Tutorial: Sentiment analysis types and approaches
![preview_player](https://i.ytimg.com/vi/5CBDoqMswK0/maxresdefault.jpg)
Показать описание
---
Welcome back! In the previous video, we learned what sentiment analysis is and why it is useful.
But how do we even start with a sentiment analysis task?
Sentiment analysis tasks can be carried out at different levels of granularity.
First is document level. This is when we look at the whole review of a product, for example.
Second is the sentence level. This refers to determining whether the opinion expressed in each sentence is positive, negative, or neutral.
The last level of granularity is the aspect level. The aspect refers to expressing opinions about different features of a product. Imagine a sentence such as "The camera in this phone is pretty good but the battery life is disappointing." It expresses both positive and negative opinions about a phone and we might want to be able to say which features of the product clients like and which they don't.
The algorithms used for sentiment analysis could be split into 2 main categories.
The first is rule or lexicon based. Such methods most commonly have a predefined list of words with a valance score. For example, nice could be +2, good +1, terrible -3, and so on.
The algorithm then matches the words from the lexicon to the words in the text and either sums or averages the scores in some way.
As an example, let's take the sentence, 'Today was a good day.'
Each word gets a score, and to get the total valance we sum the words. In this case, we have a positive sentence.
A second category is automated systems, which are based on machine learning. This is going to be our focus in this course. The task is usually modeled as a classification problem where using some historical data with known sentiment, we need to predict the sentiment of a new piece of text.
We can calculate the valance score of a text, using Python's textblob library.
We continue working with our 'Today was a good day' string.
We import the TextBlob function from the textblob package and apply it to our string. A TextBlob object is like a Python string, which has obtained some natural language processing skills. We can call different properties of the TextBlob object. We are interested in its sentiment; that's why we call sentiment on our TextBlob.
The sentiment property returns a tuple: polarity, which is measured on the scale from [-1.0 to 1.0], where -1.0 is very negative, 0 is neutral and +1.0 is very positive. Our example 'Today was a good day' carries positive emotion and thus will have a positive polarity score: 0.7. The second element in the tuple displays the subjectivity, measured from [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. So our example is rather positive and subjective.
Which method should one use? A machine learning sentiment analysis relies on having labeled historical data whereas lexicon-based methods rely on having manually created rules or dictionaries.
Lexicon-based methods fail at certain tasks because the polarity of words might change with the problem, which will not be reflected in a predefined dictionary.
However, lexicon-based approaches can be quite fast, whereas Machine learning models might take a while to train. At the same time, machine learning models can be quite powerful.
So, the jury is still out on that one. Many people find that a hybrid approach tends to work best in many, usually complex scenarios.
Now let's test what we've learned by solving some exercises!
#PythonTutorial #DataCamp #Sentiment #Analysis #Python
Комментарии