Actor Critic Algorithms

preview_player
Показать описание
Reinforcement learning is hot right now! Policy gradients and deep q learning can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms. I'll explain how they work in this video using the 'Doom" shooting game as an example.

Code for this video:

i-Nickk's winning code:

Vignesh's runner up code:

Taryn's Twitter:

More learning resources:

Please Subscribe! And like. And comment. That's what keeps me going.

Want more inspiration & education? Connect with me:

Join us in the Wizards Slack channel:

And please support me on Patreon:
Signup for my newsletter for exciting updates in the field of AI:
Рекомендации по теме
Комментарии
Автор

Siraj is definitely very important for the dissemination of AI knowledge. I myself owe Siraj many thanks for this incredible channel!!

Ronnypetson
Автор

Wow this is seriously a fantastic introduction motivating ac methods

robertotomas
Автор

You saved my live with this video. Thanks! I have to write a text, that this topic includes and i struggled for so long to understand it, but now it seems so easy.

sophieg.
Автор

Apparently codes weren't the only things he plagiarised.

Imagine this as a playground with a kid (the “actor”) and her parent (the “critic”). The kid is looking around, exploring all the possible options in this environment, such as sliding up a slide, swinging on a swing, and pulling grass from the ground. The parent will look at the kid, and either criticize or complement here based on what she did.

theapocalypse
Автор

He goes so fast. It's like he's talking to someone that already understand it.

tomw
Автор

out of all the channels im subbed to this is the only one i have notifs on
cuz its good

chicken
Автор

Thanks for the recognition @Siraj. Looking forward to your upcoming works on the channel. A Halite 2 AI bot perhaps.

VigneshKumar-xdxi
Автор

Very interesting video as usual, thank you! :-)

davidm.johnston
Автор

Did anybody actually try to run the source code? I've seen the same code snippet in two different places and none of them worked. Frankly - not only does it not work, it also has a lot of redundancy (many unused variables and errors), typos which make the code work incorrect, but are not spotted because the update methods are actually dead code which is never called. Basically the whole example is doomed because of the fact that it's just a single run through the environment and it usually stops just by hanging down. After fixing this it also does not work because the update function is never called. If you call the update function at the end of the train method it has runtime errors because of typos and wrong model use (trying to assign critic weights to the actor) and to be honest - even the neural nets are wrong - both have ReLUs as output layers, but the inputs can be negative (impossible with ReLU) and the Q-values should be mostly negative (most of the rewards are negative).

adrianjaoszewski
Автор

Hi Siraj! Can you please make a video on program synthesis? Please please please, I beg you!
For me it seems that it is the straightest way to get a skynet-level AI, but it is so underhyped, that I did't even know that word until I googled the idea behind it. I have no idea, why nobody talks about that topic. I have no idea, why they don't use neural networks. It seems that Alpha-Go suits almost perfectly for that task (this is also a search in a tree), but I haven't heard about any revolution in that area.

luck
Автор

Suggestion: In videos where you're trying to explain an idea or a method in a general form, try to simplify it as much as possible and don't go into much detail... Also definitely try examples and simple analogies as much as you can, because as we all know the process of learning works best with more examples

SLR_
Автор

How does the critic know what the action score is?

jeffpeng
Автор

Most interactive and most unclear/inaccurate video on actor-critic. Thank you!

underlecht
Автор

So is the actor's predicted best choice then optimized with gradient ascent on based on the critics Q values?

matthewdaly
Автор

the source code is not working, the target weights are not updated!

ionmosnoi
Автор

The linked source is for playing a pendulum game, not doom, which is much more complex. Honestly, I don't think you ever wrote a bot for playing doom, that's why you only show 5s of doom being played.
To prove me wrong, link the source code for the doom bot.

davidmoser
Автор

Finally you've controlled your speed. Love you bro :)

deepaks.m.
Автор

I watched a demo from NVIDIA this week in which they played a John Williams type of music score.

It was unbelievably good. It'll be interesting to see what people come up with. A new Christmas Carol ?

tonycatman
Автор

Hey Siraj!Got a chance to Implement one of the NIPS paper 2017, I have selected Reinforcement Learning Field, How Hard it will be and What is the Procedure to Implement the paper?

unicornAGI
Автор

the correct term for finding the derivative is to "Differentiate" not "Derive"

lordphu