Stanford CS229: Machine Learning | Summer 2019 | Lecture 8 - Kernel Methods & Support Vector Machine

preview_player
Показать описание

Anand Avati
Computer Science, PhD

To follow along with the course schedule and syllabus, visit:
Рекомендации по теме
Комментарии
Автор

Thanks for making such a detailed lecture available online, Dr. Avati!

anirudhthatipelli
Автор

Lecture 8 Completed : Understood the general principle of kernalization, Got a great understanding of SVM. Prof. Anand Avati really explains these concepts with clarity.

DevanshChaudhary-duuz
Автор

At 59:34, the lecturer says that the reason why Kernel methods means not having a Theta vector at prediction time is because "we give up the phi(x) representation."

I see what he's trying to say, but here's (what I believe to be) a clearer explanation:

We could, theoretically, first calculate Theta by summing up Beta_i * phi(x_i) for all i training examples, and then (at prediction time) multiply that Theta by the phi(x) vector for our test example x. However, each of those two steps would require multiplying by a massive (potentially infinite-dimensional) phi vector. (In fact, i times just for calculating Theta.)

The trick, then, is to end training with our parameter still in terms of phi(x_i), knowing that the paramater will later (at prediction time) be multiplied by the phi(x) of our test example, x -- a multiplication that we will be able to perform very easily, because we can then kernelize the two massive feature vectors with each other! This is obviously something that we are only able to do if we hold on to our phi(x_i) vector from training!

That will obviously leave us with zero massive vector multiplications, instead of tons of them (one per training example during training and one per test example during testing).

waelq
Автор

If I understand correctly, we are no longer estimating parameters like theta and instead are storing the values of Beta and all the training examples. Since in this example we would have a tendency to overfit, how would we regularize it? Can we apply Ridge/Lasso on Beta instead of Theta?

AdaGradschool
Автор

At 56:41 it should have been beta_i instead of making it a beta_j

durgeshmishra-fnkx
Автор

why did u take x1, x2, , x3... at 15:32, input vector is just in terms of x ryt?

sravanthkurmala