Understanding Coordinate Descent

preview_player
Показать описание

let's just have a little aside on the coordinate decent algorithm, and then we're gonna describe how to apply coordinate descent to solving our lasso objective. So, our goal here is to minimize sub function g. So, this is the same objective that we have whether we are talking about our closed form solution, gradient descent, or this coordinate descent algorithm. But, let me just be very explicit, where. We're saying we wanna minimize over all possible w some g(w), where here, we're assuming g(w) is function of multiple variables. Let's call it g(w0,w1,...,wD). So this W we are trying to write in some bold font here. And often, minimizing over a large set of variables can be a very challenging problem. But in contrast, often it's possible to think about optimizing just a single dimension, keeping all of the other dimensions fixed. So easy for each coordinate when keeping others fixed, because that turns into just a 1D optimization problem. And so, that's the motivation behind coordinate decent, where the coordinate descent algorithm, it's really intuitive.
Рекомендации по теме
Комментарии
Автор

will coordinate descent always converge using LASSO even if the ratio of number of features to number of observations/samples is large?

homeycheese
Автор

Should'nt that be argmin w instead of just min? since we want to return one of the arguments?

NayeliGC
Автор

This works ok on nice functions like g(x, y)=x^2+y^2 but real data often looks more like Grand Canyon where the path is very narrow and very windy.

pnachtwey