[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality

Показать описание

#ai #research #word2vec

Word vectors have been one of the most influential techniques in modern NLP to date. This paper describes Word2Vec, which the most popular technique to obtain word vectors. The paper introduces the negative sampling technique as an approximation to noise contrastive estimation and shows that this allows the training of word vectors from giant corpora on a single machine in a very short time.

OUTLINE:
0:00 - Intro & Outline
1:50 - Distributed Word Representations
5:40 - Skip-Gram Model
12:00 - Hierarchical Softmax
14:55 - Negative Sampling
22:30 - Mysterious 3/4 Power
25:50 - Frequent Words Subsampling
28:15 - Empirical Results
29:45 - Conclusion & Comments

Abstract:
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

Authors: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Рекомендации по теме

Комментарии

classic papers is maybe the best addition to this kind of content. i find it really useful and important to come back to old papers sometimes and look at them from the perspective of modern state of dl.

sg

Wow. Just wow. This was a fantastic overview of word2vec. Your explanations of the minute details and the vague and harder to grasp concepts of their paper were exceptional. Your comments of their unconventional authorship and writing style issues were also on point. I felt like I learned and re-learned how word2vec really works. Yes, please cover more classic papers, because understanding the foundations is important. Way to go Yannic!

Scranny

There is so much value in the videos just by core content itself. However, anecdotes like how the 'Hierarchical softmax' was a distraction in the paper adds much more context and hence understanding. Thank you for these videos :)

oostopitre

Thanks Yannic for the [Classic] videos! These videos are more useful than many of the papers which do small incremental improvements.

ShivaramKR

wow, I am learning word2vec from yesterday, and was struggling to grasp the concept and here you uploaded the video, explaining the paper!

doyourealise

I would definitely be into a playlist of "classical" data science videos like this. There is so much content to absorb, being able to focus on the ones that have been proven historically and vetted would be awesome.

It also gives you a chance to reference how things have improved since then, which is nice to know.

leapdaniel

Welcome to Yannic`s paper museum :)
Very nice to look at older papers as well!

florianhonicke

Great explanation of a paper as usual. And this paper (or the three of them) changed so much. Even if token-based embeddings are usually preferably. for some applications type-based word embeddings are probably still the better choice, for example if you are interested in the history of concepts and want to track their semantic change.

fotisj

Thanks for this classic series papers for us that are learning deep learning is important to cover the classic and main old ideas in the field.

MrjbushM

Thanks for visiting such an important paper!!! Awesome content!!

DiegoJimenez-icby

Really enjoying watching these videos. You did a great job explaining them!

ironic_bond

Classic papers are a great Ideas. It's really helpful for those like me who are new in ML. I often try to read some papers that are extension of algorithms introduced in the classic ones and I struggle to understand them since I don't have the prerequisite.

francoisdupont

Wait you're supposed to be having a break! This is your second video in two days. 😅

spaceisawesome

always good to look back classic papers

thearianrobben

To provide another argument for the case of classical papers: It is very difficult to anticipate which ideas will stand the test of time in the moment of their creation. But visiting ‘classical’ papers we allow ourselves the benefit of hindsight — examining those ideas that time proved to be invaluable.

kappadistributive

Thanks you. I couldn't understand word2vec from prof. Andrew Ng's video, but you explained it clearly!

wizardOfRobots

Thank you!!! So much better than the Standford class.

adriandip

My browser crashed along with my 50, 000 tabs. I restored them and suddenly Yannic is telling me about 5 papers simultaneously.

michaelfrost

Please keep going with the amazing content! Love it!

zd

Really loved your explanation. Thank You.

harshpoddar

[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality

[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality

Skipgram - Distributed Representations of Words and Phrases [...] (Paper Review Call 004)

Doc2Vec | Lecture 49 (Part 2) | Applied Deep Learning

Word2Vec Papers Explained From Scratch: Skip-Gram with Negative Sampling

Distributed Representations

Word2Vec Simplified|Word2Vec explained in simple language|CBOW and Skipgrm methods in word2vec

Word2vec (Continued) | Lecture 45 | Applied Deep Learning

HKML S4E4 - Top2Vec: Distributed Representations of Topics, with application on 2020 10-K

The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning

Learn Word2vec w/ latest TF | Word Embedding TensorFlow - Input data pipeline (1/3)

Distributed Representations of Categorical and Relational Knowledge - Haim Sompolinsky

Vectoring Words (Word Embeddings) - Computerphile

Dynamic Word Embeddings

NLPF6: Vector Representations of Words

Language Model Overview: From word2vec to BERT

[Project 2-min intro] Bayesian Word Embeddings

Word2Vec vs Autoencoder | NLP | Machine Learning

Basic Word Representations

Word Embeddings: Word2Vec

Exploring word2vec vector space - Julia Bazińska

Learning meaningful representations for natural language understanding giesselbach

Word embedding explained | Jay Alammar's 'The Illustrated Word2vec' by Abhilash | NLP...

Q&A - Hierarchical Softmax in word2vec

Ali Ghodsi, Lec [3,1]: Deep Learning, Word2vec