[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality

preview_player
Показать описание
#ai #research #word2vec

Word vectors have been one of the most influential techniques in modern NLP to date. This paper describes Word2Vec, which the most popular technique to obtain word vectors. The paper introduces the negative sampling technique as an approximation to noise contrastive estimation and shows that this allows the training of word vectors from giant corpora on a single machine in a very short time.

OUTLINE:
0:00 - Intro & Outline
1:50 - Distributed Word Representations
5:40 - Skip-Gram Model
12:00 - Hierarchical Softmax
14:55 - Negative Sampling
22:30 - Mysterious 3/4 Power
25:50 - Frequent Words Subsampling
28:15 - Empirical Results
29:45 - Conclusion & Comments

Abstract:
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

Authors: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

classic papers is maybe the best addition to this kind of content. i find it really useful and important to come back to old papers sometimes and look at them from the perspective of modern state of dl.

sg
Автор

Wow. Just wow. This was a fantastic overview of word2vec. Your explanations of the minute details and the vague and harder to grasp concepts of their paper were exceptional. Your comments of their unconventional authorship and writing style issues were also on point. I felt like I learned and re-learned how word2vec really works. Yes, please cover more classic papers, because understanding the foundations is important. Way to go Yannic!

Scranny
Автор

There is so much value in the videos just by core content itself. However, anecdotes like how the 'Hierarchical softmax' was a distraction in the paper adds much more context and hence understanding. Thank you for these videos :)

oostopitre
Автор

Thanks Yannic for the [Classic] videos! These videos are more useful than many of the papers which do small incremental improvements.

ShivaramKR
Автор

wow, I am learning word2vec from yesterday, and was struggling to grasp the concept and here you uploaded the video, explaining the paper!

doyourealise
Автор

I would definitely be into a playlist of "classical" data science videos like this. There is so much content to absorb, being able to focus on the ones that have been proven historically and vetted would be awesome.

It also gives you a chance to reference how things have improved since then, which is nice to know.

leapdaniel
Автор

Welcome to Yannic`s paper museum :)
Very nice to look at older papers as well!

florianhonicke
Автор

Great explanation of a paper as usual. And this paper (or the three of them) changed so much. Even if token-based embeddings are usually preferably. for some applications type-based word embeddings are probably still the better choice, for example if you are interested in the history of concepts and want to track their semantic change.

fotisj
Автор

Thanks for this classic series papers for us that are learning deep learning is important to cover the classic and main old ideas in the field.

MrjbushM
Автор

Thanks for visiting such an important paper!!! Awesome content!!

DiegoJimenez-icby
Автор

Really enjoying watching these videos. You did a great job explaining them!

ironic_bond
Автор

Classic papers are a great Ideas. It's really helpful for those like me who are new in ML. I often try to read some papers that are extension of algorithms introduced in the classic ones and I struggle to understand them since I don't have the prerequisite.

francoisdupont
Автор

Wait you're supposed to be having a break! This is your second video in two days. 😅

spaceisawesome
Автор

always good to look back classic papers

thearianrobben
Автор

To provide another argument for the case of classical papers: It is very difficult to anticipate which ideas will stand the test of time in the moment of their creation. But visiting ‘classical’ papers we allow ourselves the benefit of hindsight — examining those ideas that time proved to be invaluable.

kappadistributive
Автор

Thanks you. I couldn't understand word2vec from prof. Andrew Ng's video, but you explained it clearly!

wizardOfRobots
Автор

Thank you!!! So much better than the Standford class.

adriandip
Автор

My browser crashed along with my 50, 000 tabs. I restored them and suddenly Yannic is telling me about 5 papers simultaneously.

michaelfrost
Автор

Please keep going with the amazing content! Love it!

zd
Автор

Really loved your explanation. Thank You.

harshpoddar