NLP - Text Preprocessing and Text Classification (using Python)

preview_player
Показать описание
Hi! My name is Andre and this week, we will focus on text classification problem. Although, the methods that we will overview can be applied to text regression as well, but that will be easier to keep in mind text classification problem. And for the example of such problem, we can take sentiment analysis. That is the problem when you have a text of review as an input, and as an output, you have to produce the class of sentiment. For example, it could be two classes like positive and negative. It could be more fine grained like positive, somewhat positive, neutral, somewhat negative, and negative, and so forth. And the example of positive review is the following. "The hotel is really beautiful. Very nice and helpful service at the front desk." So we read that and we understand that is a positive review. As for the negative review, "We had problems to get the Wi-Fi working. The pool area was occupied with young party animals, so the area wasn't fun for us." So, it's easy for us to read this text and to understand whether it has positive or negative sentiment but for computer that is much more difficult. And we'll first start with text preprocessing. And the first thing we have to ask ourselves, is what is text? You can think of text as a sequence, and it can be a sequence of different things. It can be a sequence of characters, that is a very low level representation of text. You can think of it as a sequence of words or maybe more high level features like, phrases like, "I don't really like", that could be a phrase, or a named entity like, the history of museum or the museum of history. And, it could be like bigger chunks like sentences or paragraphs and so forth. Let's start with words and let's denote what word is. It seems natural to think of a text as a sequence of words and you can think of a word as a meaningful sequence of characters.

So, it has some meaning and it is usually like,if we take English language for example,it is usually easy to find the boundaries of words because in English we can split upa sentence by spaces or punctuation and all that is left are words.Let's look at the example,Friends, Romans, Countrymen, lend me your ears;so it has commas,it has a semicolon and it has spaces.And if we split them those,then we will get words that are ready for further analysis like Friends,Romans, Countrymen, and so forth.It could be more difficult in German,because in German, there are compound words which are written without spaces at all.And, the longest word that is still in use is the following,you can see it on the slide and it actually stands forinsurance companies which provide legal protection.So for the analysis of this text,it could be beneficial to split that compound word intoseparate words because every one of them actually makes sense.They're just written in such form that they don't have spaces.The Japanese language is a different story.
Рекомендации по теме
Комментарии
Автор

I am a NLP practitioner at work. This is great with plenty of practical examples.

manwaiyeung
Автор

This video is so insightful. Thank you so much.

SMPURNIMAWIJENDRA
Автор

The methods of text analysis in Korean and English are different. But I can learn very important basics here. Thank you.

jungjoonkil
Автор

I was looking for text classification tutorial, this does not says any but it was still very useful to clear some basics.

CJMFitnessJourney
Автор

How to do Multi Level Hierarchical Classification

PANDURANG
Автор

Why not lemmatize first and then stem (i.e. use both)?

moravec
Автор

Would domain identification (news, sports etc) comes under text classification?? Help me out!

ganj
Автор

is it possible to do text processing for multiple columns in the dataset ?

bhanupriyatham
Автор

In the title, Text Classification is written but didn't find a single talk about the Text Classification.
Waste of Time

nileshkhatri
Автор

hi how can i use nlp to create an application that mark essays?

quintinsa
Автор

..is good, but my english is not good, 其实 ,俺听不懂

aoshi
Автор

dudes, контринтуитивное сейчас скажу, на русише есть эти лекции? английский язнаю, но когда живешь за границей лишняя практика такая лишняя, на нейтиве просто быстрее воспринимается и запоминается, ы

zmeyk