Stanford CS224W: ML with Graphs | 2021 | Lecture 5.2 - Relational and Iterative Classification

preview_player
Показать описание

Jure Leskovec
Computer Science, PhD

In this video we introduce the relational classifier and iterative classification for node classification. Starting from the relational classifier, we show how to iteratively update probabilities of node labels based on the labels of neighbors. We then talk about the iterative classification that improves the collective classification by predicting node label based on labels of neighbors as well as its features.

To follow along with the course schedule and syllabus, visit:

0:00 Introduction
0:28 Probabilistic Relational Classifier (2)
3:56 Example: Initialization
5:05 Example: 1st Iteration, Update Node 3
6:32 Example: After 1st Iteration
8:35 Example: Convergence
9:41 Collective Classification Models
10:29 Iterative Classification
12:27 Computing the Summary z
14:13 Architecture of iterative Classifiers
16:55 Example: Web Page Classification (3)
23:57 Iterative Classifier - Step 1
26:12 Iterative Classifier - Iterate
27:00 Iterative Classifier - Final Prediction
Рекомендации по теме
Комментарии
Автор

I was wondering how you manage to pack so much into your lectures. I think it is by firmly focusing on the algorithms and the intuitions behind them. This allows you to be light on the theory without distorting anything. Beautiful.

alexanderkurz
Автор

graph theory, machine learning, algorithms, economics, decision theory, networks, linear algebra, probability theory ... amazing series of lectures ... I guess we have to partly thank Covid for this (?)

alexanderkurz
Автор

Summary - Relation Classification relies on node information while iterative classification relies both on node and network information (done with the help of 2 classifier), Relation classification is generally more accurate.

kshitijdesai
Автор

Thank you for the lecture!

I haven't come across methods such as Iterative classification before. The idea - using a separately trained classifier in an iterative non-trainable process - sounds a bit unnatural to me. My intuition tells me it's better to train the classifier inside each iteration, thus making the iterative process trainable. I suppose, that's when is going to happen in GNNs.

A couple of comments about the training set.

We don't need any graph structure to train phi1 classifier, therefore it can be done with traditional methods.

In order to train phi2 classifier, we need labeled nodes that have all their neighbours also labeled - that's necessary to set z_v. We don't necessary need to split all graphs into training and test sets. Instead, for phi2 training we can use all the nodes with mentioned property - as it's not going to be evaluated during the iterative stage anyway (as the labels are known - and thus 'converged' already).

BorisVasilevskiy
Автор

Thanks for the great lecture. I have a question.
For Iterative classifier (about 16:05), in Phase 1 (i.e. training phase), we train 2 classifiers \phi_1 and \phi_2.
For the label summary vector z_v required for the classifier \phi_2, is it derived from the output produced by \phi_1?

Or are the classifiers trained separately, as is suggested by @BorisVasilevskiy's earlier comment?
Because if we do train separately, the error from \phi_1 will be added to the error from \phi_2.
Which means that \phi_2 was not trained to be somewhat robust to the possible noise (that is absent in training but present in test due to error from \phi_1) in z_v.

ujjawalpanchal
Автор

Can we train the first and second classifier with the same training data? My idea is we can do it.

halilibrahimakgun
Автор

I also found the example of iterative classifier a bit difficult to understand. I thought by training the dataset, we are changing the weights/coefficients associated/multiplied with the feature vectors instead of changing the feature vectors themselves, as shown by the example.

Yi-xiangDeng
Автор

What is the point on the feature-only classifier phi-1 ? Professor never explained this.
Just train one model, Phi-2, which uses features and neighbors and use regularization. One model is enough. Use XG-Boost.
You don't even need to iterate. Train on the known labels and predict on the unknown.

I believe that a data scientist would never do this method because it is so unnecessary with extra programming work for no benefit.

tag_of_frank