Optimization, part 2 - Francis Bach - MLSS 2020, Tübingen

Показать описание

0:00:00 Optimization for Large Scale Machine Learning
0:01:14 Stochastic vs. deterministic methods
0:03:24 Stochastic average gradient (Le Roux, Schmidt, and Bach, 2012)
0:04:49 Running-time comparisons (strongly-convex)
0:07:09 Running-time comparisons (non-strongly-convex)
0:08:58 Stochastic average gradient
0:11:24 Experimental results (logistic regression)
0:14:43 Before non-uniform sampling
0:15:23 After non-uniform sampling
0:15:50 From training to testing errors
0:18:47 Linearly convergent stochastic gradient algorithms
0:20:36 Acceleration
0:21:46 Q&A
0:25:06 SGD minimizes the testing cost!
0:27:45 Robust averaged stochastic gradient (Bach and Moulines, 2013)
0:29:15 Markov chain interpretation of constant step sizes
0:32:27 Simulations - synthetic examples
0:33:19 Simulations - benchmarks
0:35:26 Perspectives
0:47:27 Beyond convex problems
0:48:01 Parametric supervised machine learning
0:48:27 Convex optimization problems
0:50:47 Exponentially convergent SGD for smooth finite sums
0:51:43 Exponentially convergent SGD for finite sums From theory to practice and vice-versa
0:53:05 Convex optimization for machine learning From theory to practice and vice-versa
0:53:59 Theoretical analysis of deep learning
0:55:53 Optimization for multi-layer neural networks
1:01:13 Gradient descent for a single hidden layer
1:05:40 Optimization on measures
1:08:15 Many particle limit and global convergence (Chizat and Bach, 2018a)
1:14:38 Simple simulations with neural networks
1:17:08 From qualitative to quantitative results?
1:20:00 Lazy training (Chizat and bach, 2018b)
1:24:24 From lazy training to neural tangent kernel
1:26:22 Healthy interactions between theory, applications, and hype?
1:30:48 Conclusions - Optimization for machine learning
1:32:04 Q&A