DeepMind x UCL RL Lecture Series - Multi-step & Off Policy [11/13]

preview_player
Показать описание
Research Scientist Hado van Hasselt discusses multi-step and off policy algorithms, including various techniques for variance reduction.

Рекомендации по теме
Комментарии
Автор

Sorry but that v-trace, isn't just a "clip the ratio"? isn't it a common thing in DL, like gradient clipping to avoid exploding gradient, or in WGAN? or am i missing something

bertobertoberto
Автор

How do we use per-decision importance weighting and control-variants technique in practice?for example like in Actor-Critic off-policy learning settings using replay buffer or demonstration learning? We don't know the target policy in practice, how we can get the value for $ro$ ?

haliteabudureyimu
Автор

How do you calculated the variance at 58:00? E[x^2] - E[x]^2 ?

Saurabhsingh-clpx
Автор

what software/hardware is used for drawing at 26:04 ?

EngIlya