Vectoring Words (Word Embeddings) - Computerphile

Показать описание

How do you represent a word in AI? Rob Miles reveals how words can be formed from multi-dimensional vectors - with some unexpected results.

08:06 - Yes, it's a rubber egg :)

Unicorn AI:

This video was filmed and edited by Sean Riley.

Рекомендации по теме

Комментарии

"Not in this data set" is my new favorite comeback oneliner

VladVladislav

“What does the fox say?”
“Don’t they go ‘ring ding ding’?”
“Not in this dataset”

wohdinhel

Okay, that was amazing. "London + Japan - England = Tokyo"

xario

I did this for my final project in my bsc. Its amazing. I found cider - apples + grapes = wine. My project attempted to use these relationships to build simulated societies and stories.

Chayatfreak

Fun points: A lot of the Word2vec concepts come from Tomáš Mikolov, a Czech scientist at Google. The Czech part is kinda important here - Czech, as a Slavic language, is very flective - you have a lot of different forms for a single word, dependent on its surroundings in a sentence. In some interview I read (that was in Czech and in a paid online newspaper, so I can't give a link), he mentioned that this inspired him a lot - you can see the words clustering by their grammatical properties when running on a Czech dataset and it's easier to reason about such changes when a significant portion of them is exposed visibly in the language itself (and learned as a child in school, because some basic parts of it are needed in order to write correctly).

Alche_mist

Tomorrow's headline:

"Science proves fox says 'Phoebe'"

kurodashinkei

I like this guy and his long sentences. It's nice to see somebody who can muster a coherent sentence of that length.
So, if you run this (it's absurdly simple, right), but if you run this on a large enough data set and give it enough compute to actually perform really well, it ends up giving you for each word a vector (that's of length however many units you have in your hidden layer), for which the nearby-ness of those vectors expresses something meaningful about how similar the contexts are that those words appear in, and our assumption is that words that appear in similar contexts are similar words.

panda

This thing would ace the analogy section of the SAT.
Apple is to tree as grape is to

model.most_similar_cosul(positive['tree', 'grape'], negative['apple']) = "vine"

rich

'fox' + 'says' = 'Phoebe' may be from newspapers quoting English actress Phoebe Fox

bluecobra

I am in love with this man's explanation! makes it so intuitive. I have a special respect for folks who can make a complex piece of science/math/computer_science into an abstract piece of art. RESPECT!

alexisxander

This was weirdly fascinating to me. I'm generally interested by most of the Computerphile videos, but this one really snagged something in my brain. I've got this odd combination of satisfaction and "Wait, really? That works?! Oh, wow!"

wolfbd

Foxes do chitter!

But primarily they say "Phoebe"

veggiet

'What does it mean for two words to be similar?'

That is a philosophy lesson I am not ready for bro

muddi

floats: some of the real numbers
- Best description and explanation ever! - It encompasses all the problems and everything....

Verrisin

Meanwhile in 2030:
"human" + "oink oink" - "pig" = "pls let me go skynet"

adamsvoboda

This is basically node embedding from graph neural networks. Each sentence you use to train the it can be seen as a random walk in the graph that relates each world with each other. The number of words in the sentence can be seem as how long you walk from the node. Besides "word-vector arithmetics", one thing interesting to see would be to use this data to generate a graph of all the words and how they relate to each other. Than you could do network analysis with it, see for example, how many clusters of words and figure out what is their labels. Or label a few of them and let the graph try to predict the rest of them.

Another interesting thing would be to try to embed sentences based on the embedding of words. For that you would get a sentence and train a function that maps points in the word space to points in the sentence space, by aggregating the word points some how. That way you could compare sentences that are close together. Then you can make sentences-vector arithmetics.
This actually sounds like a cool project. I think I'm gonna give it a try.

Alkis

Rather than biggest city, it seems obvious it would be the most written about city, which may or may not be the same thing.

kal

I love how this video seems to have time travelled from 1979 - the austere painted brick classroom, the shirt, the hair and beard, even the thoughtful and articulate manner seem to come from another time.

vindolanda

Today, vector databases are a revolution to AI models. This man was way ahead of time.

nemanjajerinic

I'm a man of simple tastes. I see Rob Miles, I press the like button.

joshuar

Vectoring Words (Word Embeddings) - Computerphile

Vectoring Words (Word Embeddings) - Computerphile

Word Embedding and Word2Vec, Clearly Explained!!!

12.1: What is word2vec? - Programming with Text

What is Word2Vec? A Simple Explanation | Deep Learning Tutorial 41 (Tensorflow, Keras & Python)

Vector databases are so hot right now. WTF are they?

Vector Databases simply explained! (Embeddings & Indexes)

The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning

Word Embedding - Natural Language Processing| Deep Learning

Word embedding | natural language processing (NLP)

What are AI vector embeddings?

Text Vectorization NLP | Vectorization using Python | Bag Of Words | Machine Learning

AI Demystified: Word Embeddings

Pre Trained Word Embeddings | Word2Vect, GloVe

How AI 'Understands' Images (CLIP) - Computerphile

Embeddings from Scratch!

Natural Language Processing|TF-IDF Intuition| Text Prerocessing

What Are Word Embeddings? | Aadil Ali

LLM embeddings explained by Jerry Liu from LlamaIndex

Intuition & Use-Cases of Embeddings in NLP & beyond

L04 : Text and Embeddings: Introduction to NLP, Word Embeddings, Word2Vec

Bag of Words - Intro to Machine Learning

5 Ways To Generate Vector Embeddings

Word Embeddings: Introduction

[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality