Simple Deep Neural Networks for Text Classification

Показать описание

Hi. In this video, we will apply neural networks for text. And let's first remember, what is text? You can think of it as a sequence of characters, words or anything else. And in this video, we will continue to think of text as a sequence of words or tokens. And let's remember how bag of words works. You have every word and forever distinct word that you have in your dataset, you have a feature column. And you actually effectively vectorizing each word with one-hot-encoded vector that is a huge vector of zeros that has only one non-zero value which is in the column corresponding to that particular word. So in this example, we have very, good, and movie, and all of them are vectorized independently. And in this setting, you actually for real world problems, you have like hundreds of thousands of columns. And how do we get to bag of words representation? You can actually see that we can sum up all those values, all those vectors, and we come up with a bag of words vectorization that now corresponds to very, good, movie. And so, it could be good to think about bag of words representation as a sum of sparse one-hot-encoded vectors corresponding to each particular word. Okay, let's move to neural network way. And opposite to the sparse way that we've seen in bag of words, in neural networks, we usually like dense representation. And that means that we can replace each word by a dense vector that is much shorter. It can have 300 values, and now it has any real valued items in those vectors. And an example of such vectors is word2vec embeddings, that are pretrained embeddings that are done in an unsupervised manner. And we will actually dive into details on word2vec in the next two weeks. But, all we have to know right now is that, word2vec vectors have a nice property. Words that have similar context in terms of neighboring words, they tend to have vectors that are collinear, that actually point to roughly the same direction. And that is a very nice property that we will further use. Okay, so, now we can replace each word with a dense vector of 300 real values. What do we do next? How can we come up with a feature descriptor for the whole text? Actually, we can use the same manner as we used for bag of words. We can just dig the sum of those vectors and we have a representation based on word2vec embeddings for the whole text, like very good movie. And, that's some of word2vec vectors actually works in practice. It can give you a great baseline descriptor, a baseline features for your classifier and that can actually work pretty well. Another approach is doing a neural network over these embeddings.

Рекомендации по теме

Комментарии

Hey there! I normally don’t leave comments, or likes but I had to stop here!
You’ve explained a convoluted topic in a clear, digestible and concise way. Thank you!

Posejdonkon

Thank you for the good explanation!
You forgot to link the paper you mentioned (at 12:43). For all who are interested: I think it was about this paper:
"Convolutional Neural Networks for Sentence Classification" by Yoon Kim

maxlegnar

This is the most comprehensive video I've ever seen on neural networks! Thank you so much! I study and develop AI, but was using something more like the bag of words representation. The other thing, aside from accuracy that I noticed to be an issue with the bag of words representation, was actually the amount of resources it required from the machine it was operating on. To give some insight into just how bad it was, while the machine I was using wasn't exactly top of the line, the machine I'm using now is pretty high performance (i5-8400, 16gb RAM, 1TB Samsung Evo 860 SSD) and yet, the facial recognition usually dropped down the camera feed to about 3-5fps when it would detect a face. Even generating a response (using Speech-to-Text, then a custom-tailored version of the Levenshtein Distance algorithm to correct any misinterpretation of speech) was using at least 7GB of RAM even with a relatively small data set in the vicinity of maybe 50GB, and using 40-60% of my CPU power. Anyhow, my intent with watching this video was to learn about better algorithms, with the intent of actually implementing a neural network on an FPGA Now I feel well-equipped with enough information to conquer that finally, as I feel I finally understand CNNs well enough. Thanks so much!

danm

this is the best explanation i've seen on CNN applied on text input

louisd

One of best lecture i have heard ever. Seriously i was totally in to your video for 15mins, which i forgot external world. Awaiting for next set of topics.

manjuappu

2:05 freudian slip? made me crack up haha
Excellent video, thanks for sharing!

boooo

Really nice explanations even if the convolution network internals are not enough explained.

sylvainbzh

One of the best videos to understand string inputs for Neural Nets.

NandishA

Fantastic! You have explained it very very well. Please upload more videos on related and Machine Learning topics. Thank you so much.

ijeffking

Great work thanks. Can't wait for the next. Very well explained

DanielWeikert

IT is the best explanation of word embeddings ever seen

kushshri

Excellent video. This video made me watch the whole playlist

luislptigres

at 11:19 I am confused about why each gram we learn 100 filters? What is the filter in this case? I thought by applying the 3-gram kernel using the same padding, we will get (1, n) vector, where n: number of words in this case n=5. Then we have 3, 4, 5 gram, shouldn't we just have 3 (1, n) vectors? If we get max value for each gram, shouldn't we just have 3 outputs, where each output from each x-gram vector (size =(1, n))? Can you explain why you said 300 outputs? Thanks,

hellochii

How does this compare with the attention mechanism in transformers?

tantzer

excuse my stupidity, on 4:19 how do you get 0.9 from word embeddings and convolutional filter, is it a dot product? or some thing else?

rialtosan

Where did the 0.9 and 0.84 come from? Sorry, I'm new to this...

johntsirigotis

at 1:54 what are the inputs are they [very, good, movie] or are they the [x1, x2, x3].

barax

at 4:25, the result of convolution is not 0.9, it is 0.88. How CNN create these filters? For instance we defined 16 filters to apply. How CNN library determine the content of filters ( numbers) ?

arnetmitarnetmit

Please, explain the meaning of the final vector obtained after the 1d convolution and i guess, trained in some way.

argentineinformationservic

What about the context of the text. Why would you use this rather than use something like a GRU or LSTM

bismeetsingh

Simple Deep Neural Networks for Text Classification

Neural Networks Explained in 5 minutes

Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work | Simplilearn

Explained In A Minute: Neural Networks

But what is a neural network? | Deep learning chapter 1

Neural Network Simply Explained | Deep Learning Tutorial 4 (Tensorflow2.0, Keras & Python)

Neural Network Simply Explained - Deep Learning for Beginners

The Essential Main Ideas of Neural Networks

Deep Learning | What is Deep Learning? | Deep Learning Tutorial For Beginners | 2023 | Simplilearn

Simple E2E Artificial Neural network ANN AI

Deep Neural Network (DNN) | Deep Learning

How Does a Neural Network Work in 60 seconds? The BRAIN of an AI

Neural Networks explained in 60 seconds!

Neural Network Architectures & Deep Learning

Shortcut Learning in Deep Neural Networks

Create a Basic Neural Network Model - Deep Learning with PyTorch 5

What are Convolutional Neural Networks (CNNs)?

How to create your FIRST NEURAL NETWORK with TensorFlow!

Artificial neural networks (ANN) - explained super simple

Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Pyt...

ANN vs CNN vs RNN | Difference Between ANN CNN and RNN | Types of Neural Networks Explained

TensorFlow in 100 Seconds

Neural Networks and Deep Learning: Crash Course AI #3

Convolutional Neural Networks Explained (CNN Visualized)

I Built a Neural Network from Scratch