Text Classification - Natural Language Processing With Python and NLTK p.11

preview_player
Показать описание
Now that we understand some of the basics of of natural language processing with the Python NLTK module, we're ready to try out text classification. This is where we attempt to identify a body of text with some sort of label.

To start, we're going to use some sort of binary label. Examples of this could be identifying text as spam or not, or, like what we'll be doing, positive sentiment or negative sentiment.

Рекомендации по теме
Комментарии
Автор

For those who are watching on 2019: Scikit has text feature extractor ready to use that includes TF-IDF.

mateusgoncalvesmachado
Автор

I'm a beginner in Python and NLP as well. I find it difficult to keep track with my lectures but I find this very helpful. Thank you so much.

MsShmg
Автор

If anyone gets the error "NameError: name 'fileid' is not defined", then change "fileid" to "field". This worked fine for me.
Great tutorial as always! :)

suharshapw
Автор

It was just awesome!Literally, Man!Great Work!

rushirajparmar
Автор

Great video! Has been really helpful.
Just my 2 cents on the list comprehension at ~5:00.
Writing it in one line leads to better performance.
In this case the difference is small: I've timed 3.39s for the one-liner and 3.64s for multiline.
I've done multiple tests, the difference can be smaller but the one-liner is always measurably faster.
But if we scale up our data by ~100 then fractions of seconds become minutes; by times and it becomes hours.

twalkington
Автор

6:45 It's actually naive because it assumes all variables are independable :> <3 Love you Harrison!

EranM
Автор

Hey, How do u compare meaning of two paragraphs with NLTK like how much percent they are similar?
For ex: Para A and Para B matches 60%

chanakyavolam
Автор

what kind of variable types do I need in my for loop if I want to make my own categories and own file of word

the double for loop is vague to me because I have no idea what fileid actually does

elementsofarah
Автор

This is an Amazing video! Everything well explained!

luisxd
Автор

Thanks you for great video,
What about multi classification? when we have more than two classes what module/classifier I can use ?

mokhadra
Автор

Can someone pls tell me. I am getting an error list object not callable 4:58. And moreover how is append function accepting two parameters?

arkomukherjee
Автор

Hey, I have a file which contains all the tweets which are entirely pre-processed cleaned up tweets. And I want to read that text file line by line and check whether the specific tweet is positive or negative. How can I do that? Please help

siddharthjain
Автор

Great tutorials!
I am trying to create a classifier similar to this one but using a labeled pandas data frame with WhatsApp messages, in place of the movie_reviews corpus. I am stuck on the step of creating this list you call documents (very important!). All I want to know is if this type of list would be possible from a dataset of labeled messages?

brandonjanes
Автор

Can you please tell us where do you learn all this stuff? It will be really helpful

backgroundnoiselistener
Автор

where do I need to look for a list of nltk function calls? Like you called word(), category(), etc...which is not a common function...

aldosanjoto
Автор

Hi!
Very helpful tuts!!
Could you please let me know how to create a custom corpus and to train that corpus using ur code...?

chetanhireholi
Автор

I really love your tutorials. Thanks a lot. Pls what about a corpus that is saved as CSV or ARFF? Can you help with the code to read it as documents? I am still new to python and NLTK. Thanks.

adewolekayode
Автор

can you compare between wordnet.wup_similarity and wordnet.path_similarity?

NaveenKumarSangi
Автор

Hi I have a question that i would like your expertise on

Firstly i would like to say that your tutorial is great. It aided me a lot in my previous semester work. However i need some specific help.

I am currently doing a project where i am basically collecting questions that are asked and generating various "tags" for each question. These tags allows analyst to understand what the context of the question is.

So far i have thought about using topic modelling algorithm as well as pre processing methods that you have shared in your previous tutorial. What got me confused is how do we get the context of the question.

For example the question "What is networking" is different when asked by someone majoring in Computer Science and someone who is in the business field! Any tips? I would greatly appreciate it hehe

justinsoh
Автор

Hi, how to access a different corpus, because I have tried to do like..
import nltk
import random
from nltk.corpus import names

documents = [(list(names.words(fileid)), category)
for category in names.categories()
for fileid in names.fileids(category)]

random.shuffle(documents)

print(documents[1])

but it is giving me errors.

sizogolimpi