Stanford CS224W: ML with Graphs | 2021 | Lecture 3.2-Random Walk Approaches for Node Embeddings

preview_player
Показать описание

Jure Leskovec
Computer Science, PhD

In this video we look at a more effective similarity function - the probability of node co-occurrence in random walks on the graph. We introduce the intuition behind random walks, the objective function we will be optimizing, and how we can efficiently perform the optimization. We introduce node2vec, that combines BFS and DFS to generalize the concept of random walks.

To follow along with the course schedule and syllabus, visit:

0:00 Introduction
0:12 Notation
3:27 Random-Walk Embeddings
4:34 Why Random Walks?
5:18 Unsupervised Feature Learning
6:07 Feature Learning as Optimization
7:12 Random Walk Optimization
11:07 Negative Sampling
13:37 Stochastic Gradient Descent
15:59 Random Walks: Summary
16:49 How should we randomly walk?
17:29 Overview of nodezvec
19:41 BFS vs. DFS
19:57 Interpolating BFS and DFS
20:52 Biased Random Walks
23:47 nodezvec algorithm
24:50 Other Random Walk Ideas
25:46 Summary so far
Рекомендации по теме
Комментарии
Автор

I do not understand why we optimize this particular expression ' Sum over all nodes of log P(Nr(u) | Zu) '.

I see that it is encoder which learns f(u)=Zu ; node embedding for node u.

What does Probability of neighborhood of u (using some random walk) for the given node embedding of node u even mean? If this is done by the decoder, then the question is, isn't decoder meant to predict the node from node embedding instead of "neighborhood"?

ananthakrishnank
Автор

Why can we use maximum likelihood estimation here? The observed data should be independent for that and it's clearly not since the probability of visit node v depends on the previous nodes visited

isaacgonzalez
Автор

I had a quick question. Do you generate the negative samples every epoch, and then use them for each node, or do you generate new negative samples for every single node you sample in every generation?

Simon-edzc
Автор

I didn't understand the part "drawback of node2vec", why we need to learn each node embedding individually ?

shexiaogui
Автор

why should theta between embeddings be directly proportional to co-occurrence of the two nodes? shouldn't the embeddings be close together if the nodes co-occur often implying similarity, not further apart? shouldn't it be inversely proportional, i.e. the more times they occur on the same path the closer they are in angle? I must be missing something obvious here.

arthurpenndragon
Автор

What do you mean by the negative examples in negative sampling?

sarangak.mahanta