Optimization Techniques - W2023 - Lecture 10 (Distributed Optimization & Non-Smooth Optimization)

preview_player
Показать описание
The course "Optimization Techniques" (ENGG*6140, section 2) at the School of Engineering at the University of Guelph. Instructor: Benyamin Ghojogh
Lecture 10 continues the distributed optimization methods by reviewing ADMM introduced in the previous session. Then, we talk about how to make univariate/multivariate optimization problem distributed. An example of ADMM in image processing is also introduced. Then, we switch to non-smooth optimization. We start with lasso (l1 norm) optimization. Then, we review the various methods for solving non-smooth optimization. These methods include convex conjugate, Huber and pseudo-Huber functions, soft-thresholding and proximal methods, coordinate descent, and subgradient methods.

Chapters:

0:00 - Talking about project presentation
4:30 - Correcting a typo in Wolfe conditions for line-search
7:13 - Brief review of distributed optimization from previous session
12:33 - Use cases of distributed algorithms
14:34 - Making univariate optimization problem distributed
20:12 - Making multivariate optimization problem distributed
32:42 - Example of ADMM in image processing
44:22 - Important references in distributed optimization
48:00 - Non-smooth optimization (introduction)
50:27 - Lasso (l1 norm) regularization
1:01:33 - Convex conjugate
1:09:26 - Convex conjugate of l1 norm
1:15:01 - Gradient in terms of convex conjugate
1:17:33 - Proximity function and Huber function
1:25:23 - Huber and pseudo-Huber functions
1:33:21 - Soft-thresholding and proximal methods
1:38:31 - Coordinate descent
1:47:19 - Coordinate descent for l1 norm optimization
1:58:14 - Subgradient
2:04:15 - Subdifferential
2:06:47 - Subdifferential for l1 norm
2:11:33 - First-order optimality condition for non-smooth function
2:15:09 - Subgradient in example functions
2:18:23 - Subgradient method
2:22:00 - Stochastic subgradient method
2:23:23 - Projected subgradient method
2:24:32 - Important references in non-smooth optimization
2:29:13 - Talking about the next sessions
Рекомендации по теме
Комментарии
Автор

Thank you for the video. I would like to ask you a couple of questions about coordinate descent. First, why did we decompose the objective function during soft-thresholding? Specifically, why did we separate xj'T(y-XiBi)/||xj||'2) and lambda/||xj||'2. What is the exact purpose of this decomposition? Secondly, regarding the 'repeat .... until converged' situation during coordinate steps, should we iterate as many times as the number of independent variables? Or did I initialize the regression coefficients to zero initially, and then sequentially find all regression coefficients (first iteration)? In the second iteration, should I use the regression coefficients obtained from the first iteration to find new values? And should I continue these steps until the error square values are constant? Or until which point should I apply iterations? I would greatly appreciate your assistance

kylmaz
Автор

The equivalent formula for L1 norm (slide 12) can also be interpreted in another words: the absolute value of a real variable is either itself or minus one multiplied by itself. In both cases, it is the maximum of the multiplication of a variable in [-1, 1].

AliArjomandi-fxev