NLP - Text Preprocessing and Text Classification (using Python)

Показать описание

Hi! My name is Andre and this week, we will focus on text classification problem. Although, the methods that we will overview can be applied to text regression as well, but that will be easier to keep in mind text classification problem. And for the example of such problem, we can take sentiment analysis. That is the problem when you have a text of review as an input, and as an output, you have to produce the class of sentiment. For example, it could be two classes like positive and negative. It could be more fine grained like positive, somewhat positive, neutral, somewhat negative, and negative, and so forth. And the example of positive review is the following. "The hotel is really beautiful. Very nice and helpful service at the front desk." So we read that and we understand that is a positive review. As for the negative review, "We had problems to get the Wi-Fi working. The pool area was occupied with young party animals, so the area wasn't fun for us." So, it's easy for us to read this text and to understand whether it has positive or negative sentiment but for computer that is much more difficult. And we'll first start with text preprocessing. And the first thing we have to ask ourselves, is what is text? You can think of text as a sequence, and it can be a sequence of different things. It can be a sequence of characters, that is a very low level representation of text. You can think of it as a sequence of words or maybe more high level features like, phrases like, "I don't really like", that could be a phrase, or a named entity like, the history of museum or the museum of history. And, it could be like bigger chunks like sentences or paragraphs and so forth. Let's start with words and let's denote what word is. It seems natural to think of a text as a sequence of words and you can think of a word as a meaningful sequence of characters.

So, it has some meaning and it is usually like,if we take English language for example,it is usually easy to find the boundaries of words because in English we can split upa sentence by spaces or punctuation and all that is left are words.Let's look at the example,Friends, Romans, Countrymen, lend me your ears;so it has commas,it has a semicolon and it has spaces.And if we split them those,then we will get words that are ready for further analysis like Friends,Romans, Countrymen, and so forth.It could be more difficult in German,because in German, there are compound words which are written without spaces at all.And, the longest word that is still in use is the following,you can see it on the slide and it actually stands forinsurance companies which provide legal protection.So for the analysis of this text,it could be beneficial to split that compound word intoseparate words because every one of them actually makes sense.They're just written in such form that they don't have spaces.The Japanese language is a different story.

Рекомендации по теме

Комментарии

I am a NLP practitioner at work. This is great with plenty of practical examples.

manwaiyeung

This video is so insightful. Thank you so much.

SMPURNIMAWIJENDRA

The methods of text analysis in Korean and English are different. But I can learn very important basics here. Thank you.

jungjoonkil

I was looking for text classification tutorial, this does not says any but it was still very useful to clear some basics.

CJMFitnessJourney

How to do Multi Level Hierarchical Classification

PANDURANG

Why not lemmatize first and then stem (i.e. use both)?

moravec

Would domain identification (news, sports etc) comes under text classification?? Help me out!

ganj

is it possible to do text processing for multiple columns in the dataset ?

bhanupriyatham

In the title, Text Classification is written but didn't find a single talk about the Text Classification.
Waste of Time

nileshkhatri

hi how can i use nlp to create an application that mark essays?

quintinsa

..is good, but my english is not good, 其实，俺听不懂

aoshi

dudes, контринтуитивное сейчас скажу, на русише есть эти лекции? английский язнаю, но когда живешь за границей лишняя практика такая лишняя, на нейтиве просто быстрее воспринимается и запоминается, ы

zmeyk

NLP - Text Preprocessing and Text Classification (using Python)

NLP Text preprocessing and tokenization.

Text Preprocessing | tokenization | cleaning | stemming | stopwords | lemmatization

Text Preprocessing in NLP | Python

Natural Language Processing In 5 Minutes | What Is NLP And How Does It Work? | Simplilearn

Text Preprocessing | NLP Course Lecture 3

NLP - Text Preprocessing and Text Classification (using Python)

Text Preprocessing « NLP « Machine Learning – Mathematica Essentials

005 - NLP: Introduction to Text Preprocessing

Building a Text Classifier with TensorFlow : Step-by-Step Guide

Hands-on Text Preprocessing in Python Part 1 | Natural Language Processing basics

Introduction to NLP | Text Cleaning and Preprocessing

Natural Language Processing (NLP) & Text Mining Tutorial | Machine Learning Tutorial | Simplilea...

Simple text processing in Python with TextBlob | Python NLP Tutorial

Natural Language Processing: NLP 02 Text Preprocessing and NLP Basics

12 Count URLs and Remove it | Text Preprocessing and Mining for NLP | KGP Talkie

Natural Language Processing|TF-IDF Intuition| Text Prerocessing

15 Remove Multiple Spaces | Text Preprocessing and Mining for NLP | KGP Talkie

Natural Language Processing with spaCy & Python - Course for Beginners

10 Contraction to Expansion | Text Preprocessing and Mining for NLP | KGP Talkie

Text Preprocessing, NLP

Text analysis in R. Part 1: Preprocessing

20 Common words removal | Text Preprocessing and Mining for NLP | KGP Talkie

1 Introduction | Text Preprocessing and Mining for NLP | KGP Talkie

21 Rare words removal | Text Preprocessing and Mining for NLP | KGP Talkie