Actor Critic Algorithms

Показать описание

Reinforcement learning is hot right now! Policy gradients and deep q learning can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms. I'll explain how they work in this video using the 'Doom" shooting game as an example.

Code for this video:

i-Nickk's winning code:

Vignesh's runner up code:

Taryn's Twitter:

More learning resources:

Please Subscribe! And like. And comment. That's what keeps me going.

Want more inspiration & education? Connect with me:

Join us in the Wizards Slack channel:

And please support me on Patreon:
Signup for my newsletter for exciting updates in the field of AI:

Рекомендации по теме

Комментарии

Siraj is definitely very important for the dissemination of AI knowledge. I myself owe Siraj many thanks for this incredible channel!!

Ronnypetson

Wow this is seriously a fantastic introduction motivating ac methods

robertotomas

You saved my live with this video. Thanks! I have to write a text, that this topic includes and i struggled for so long to understand it, but now it seems so easy.

sophieg.

Apparently codes weren't the only things he plagiarised.

Imagine this as a playground with a kid (the “actor”) and her parent (the “critic”). The kid is looking around, exploring all the possible options in this environment, such as sliding up a slide, swinging on a swing, and pulling grass from the ground. The parent will look at the kid, and either criticize or complement here based on what she did.

theapocalypse

He goes so fast. It's like he's talking to someone that already understand it.

tomw

out of all the channels im subbed to this is the only one i have notifs on
cuz its good

chicken

Thanks for the recognition @Siraj. Looking forward to your upcoming works on the channel. A Halite 2 AI bot perhaps.

VigneshKumar-xdxi

Very interesting video as usual, thank you! :-)

davidm.johnston

Did anybody actually try to run the source code? I've seen the same code snippet in two different places and none of them worked. Frankly - not only does it not work, it also has a lot of redundancy (many unused variables and errors), typos which make the code work incorrect, but are not spotted because the update methods are actually dead code which is never called. Basically the whole example is doomed because of the fact that it's just a single run through the environment and it usually stops just by hanging down. After fixing this it also does not work because the update function is never called. If you call the update function at the end of the train method it has runtime errors because of typos and wrong model use (trying to assign critic weights to the actor) and to be honest - even the neural nets are wrong - both have ReLUs as output layers, but the inputs can be negative (impossible with ReLU) and the Q-values should be mostly negative (most of the rewards are negative).

adrianjaoszewski

Hi Siraj! Can you please make a video on program synthesis? Please please please, I beg you!
For me it seems that it is the straightest way to get a skynet-level AI, but it is so underhyped, that I did't even know that word until I googled the idea behind it. I have no idea, why nobody talks about that topic. I have no idea, why they don't use neural networks. It seems that Alpha-Go suits almost perfectly for that task (this is also a search in a tree), but I haven't heard about any revolution in that area.

luck

Suggestion: In videos where you're trying to explain an idea or a method in a general form, try to simplify it as much as possible and don't go into much detail... Also definitely try examples and simple analogies as much as you can, because as we all know the process of learning works best with more examples

SLR_

How does the critic know what the action score is?

jeffpeng

Most interactive and most unclear/inaccurate video on actor-critic. Thank you!

underlecht

So is the actor's predicted best choice then optimized with gradient ascent on based on the critics Q values?

matthewdaly

the source code is not working, the target weights are not updated!

ionmosnoi

The linked source is for playing a pendulum game, not doom, which is much more complex. Honestly, I don't think you ever wrote a bot for playing doom, that's why you only show 5s of doom being played.
To prove me wrong, link the source code for the doom bot.

davidmoser

Finally you've controlled your speed. Love you bro :)

deepaks.m.

I watched a demo from NVIDIA this week in which they played a John Williams type of music score.

It was unbelievably good. It'll be interesting to see what people come up with. A new Christmas Carol ?

tonycatman

Hey Siraj!Got a chance to Implement one of the NIPS paper 2017, I have selected Reinforcement Learning Field, How Hard it will be and What is the Procedure to Implement the paper?

unicornAGI

the correct term for finding the derivative is to "Differentiate" not "Derive"

lordphu

Actor Critic Algorithms

Actor Critic Algorithms

Everything You Need To Master Actor Critic Methods | Tensorflow 2 Tutorial

Reinforcement Learning Course: Intro to Advanced Actor Critic Methods

Overview of Deep Reinforcement Learning Methods

Actor-Critic Reinforcement for continuous actions!

Actor Critic Methods Foundations

CS 182: Lecture 16: Part 1: Actor-Critic & Q-Learning

Actor-Critic Algorithms

Reinforcement Learning 6: Policy Gradients and Actor Critics

CS885 Lecture 7b: Actor Critic

Actor-Critic | The Hitchhiker's Guide to Machine Learning Algorithms

Advantage Actor Critic (A2C) Reinforcement Learning in Python with TF | OpenAIGym

Asynchronous Advantage Actor-Critic in 60 Seconds | Machine Learning Algorithms

Actor Critic and REINFORCE

Actor Critic Algorithm Introduction

What is Actor-Critic?

Advantage Actor-Critic (A2C) algorithm explained with codes and example in reinforcement learning

Soft Actor-Critic: a beginner-friendly introduction

Reinforcement Learning Actor-Critic different algorithms PPO, DDPG, SAC

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

Reinforcement Learning 23 - REINFORCE & Actor-Critic Methods

Actor-Critic Model Predictive Control (Talk ICRA 2024)

L5 DDPG and SAC (Foundations of Deep RL Series)

AI learns how to land on the moon (Continuous Actor critic, reinforcement learning)