AlphaGo - How AI mastered the hardest boardgame in history

preview_player
Показать описание
In this episode I dive into the technical details of the AlphaGo Zero paper by Google DeepMind.

This AI system uses Reinforcement Learning to beat the world's Go champion using only self-play, a remarkable display of clever engineering on the path to stronger AI systems.

If you want to support this channel, here is my patreon link:

Рекомендации по теме
Комментарии
Автор

7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.

kkkkjjjj
Автор

I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.

alonamaloh
Автор

This is a valuable explanation, this channel is a great discovery

noranta
Автор

We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...

shamimhussain
Автор

Clearest and most informative video I've seen on AlphaGo. Thanks!

antonystringfellow
Автор

Awesome explination! (And, you're greenscreen work looks great!)

dankelly
Автор

You explanation skills are fantastic! I like how he has an outline at the begging of his video, very simple thing yet very effective when it comes to teaching a subject, yet so few educational videos do that.

If I were to figure out the paper by myself, it would have taken me personally ~2x longer.

Subscribed.

SantoshGupta-jnwn
Автор

Thank you! This is the one of the clearest and most concise explanations of any paper I've found thus far.

elishaishaal
Автор

You explained technical stuff very clearly. Thanks Arxiv Insights

clrajapaksha
Автор

Best explanation I found about AlphaGo Zero

augustopertence
Автор

It's very clear, thank you! I can't wait to discover the other videos :)

Slab
Автор

This is the best video regarding Alpha GO paper. Just Amazing !!!

arijit
Автор

Thank you for taking the time to explain it so well. Still difficult for me that I'm not familiar with the matter yet, but you did really a good job of showing it clearly!

AlessandroOrlandi
Автор

Excellent explanation, thanks!! I'm going to make my own 9*9 alphago zero version

Hyrtsi
Автор

Fantastic explanation! Few people balance simplicity with thoroughness as well as you do.

chinadragon
Автор

The part I don't understand, is how they dispense with rollout in MCTS. It seems like this is the only way to get a ground truth value ( by reaching a terminal state) which can then be propagated back up the chain. If you reach a non terminal state, you're back propagating a value from the policy network which won't have useful values until it can be trained on useful values from the tree search. It seems like it's pulling itself up by it's bootstraps. Is it the case that the true values come from the odd time that a simulation reaches a terminal state? Or am I missing something fundamental?

generichuman_
Автор

the transition 'dhkk' hits hard

siddarthc
Автор

Brilliant - thanks for this! Really enjoyed watching and I think it takes away all the right information from the paper.

Just a quick point: is there any chance you could quieten down the background music for your next video? It was slightly distracting and I think it detracted a bit from your great explanation!

Merry Christmas!

curiousalchemist
Автор

This is cool, but after the third random jumpscare sound I couldn't pay attention to what you were saying--all I could think about was when the next one would be. Gave up halfway through since it was stressing me out

zzewt
Автор

Thank you, finally I found a good video on this paper.

SiavashFahimi