AlphaGo - How AI mastered the hardest boardgame in history

Показать описание

In this episode I dive into the technical details of the AlphaGo Zero paper by Google DeepMind.

This AI system uses Reinforcement Learning to beat the world's Go champion using only self-play, a remarkable display of clever engineering on the path to stronger AI systems.

If you want to support this channel, here is my patreon link:

Рекомендации по теме

Комментарии

7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.

kkkkjjjj

I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.

alonamaloh

This is a valuable explanation, this channel is a great discovery

noranta

We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...

shamimhussain

Clearest and most informative video I've seen on AlphaGo. Thanks!

antonystringfellow

Awesome explination! (And, you're greenscreen work looks great!)

dankelly

You explanation skills are fantastic! I like how he has an outline at the begging of his video, very simple thing yet very effective when it comes to teaching a subject, yet so few educational videos do that.

If I were to figure out the paper by myself, it would have taken me personally ~2x longer.

Subscribed.

SantoshGupta-jnwn

Thank you! This is the one of the clearest and most concise explanations of any paper I've found thus far.

elishaishaal

You explained technical stuff very clearly. Thanks Arxiv Insights

clrajapaksha

Best explanation I found about AlphaGo Zero

augustopertence

It's very clear, thank you! I can't wait to discover the other videos :)

Slab

This is the best video regarding Alpha GO paper. Just Amazing !!!

arijit

Thank you for taking the time to explain it so well. Still difficult for me that I'm not familiar with the matter yet, but you did really a good job of showing it clearly!

AlessandroOrlandi

Excellent explanation, thanks!! I'm going to make my own 9*9 alphago zero version

Hyrtsi

Fantastic explanation! Few people balance simplicity with thoroughness as well as you do.

chinadragon

The part I don't understand, is how they dispense with rollout in MCTS. It seems like this is the only way to get a ground truth value ( by reaching a terminal state) which can then be propagated back up the chain. If you reach a non terminal state, you're back propagating a value from the policy network which won't have useful values until it can be trained on useful values from the tree search. It seems like it's pulling itself up by it's bootstraps. Is it the case that the true values come from the odd time that a simulation reaches a terminal state? Or am I missing something fundamental?

generichuman_

the transition 'dhkk' hits hard

siddarthc

Brilliant - thanks for this! Really enjoyed watching and I think it takes away all the right information from the paper.

Just a quick point: is there any chance you could quieten down the background music for your next video? It was slightly distracting and I think it detracted a bit from your great explanation!

Merry Christmas!

curiousalchemist

This is cool, but after the third random jumpscare sound I couldn't pay attention to what you were saying--all I could think about was when the next one would be. Gave up halfway through since it was stressing me out

zzewt

Thank you, finally I found a good video on this paper.

SiavashFahimi

AlphaGo - How AI mastered the hardest boardgame in history

AlphaGo - How AI mastered the hardest boardgame in history

How Google's AI mastered the world's most compl...

AlphaGo - The Movie | Full award-winning documentary

Google's AI AlphaGo Is Beating Humanity At Its Own Games (HBO)

The computer that mastered Go

What the AI Behind AlphaGo Teaches Us About Humanity

Lee Sedol vs AlphaGo Move 37 reactions and analysis

AlphaGo Official Trailer

How Google's AI mastered the world's most compl...

Reinforcement Learning: AlphaGo

How DeepMind Conquered Go With Deep Learning (AlphaGo) | Two Minute Papers #42

AlphaGo Zero: Discovering new knowledge

Why did Lee Sedol, one of the world’s best ‘Go’ players, retire from the game?

AlphaGo vs Lee Sedol Hand of God Move 78 Reaction and Analysis

Deep Mind's AlphaGo Zero - EXPLAINED

Don’t Freak Out Over Google’s AI Beating a Go Grandmaster. It’s a Good Thing

Here we 'Go' again: Humans to battle Google AlphaGo AI in ancient game (CNET News)

Tech Digest: How Google’s AlphaZero AI mastered chess | 11 December 2017

How DeepMind's AlphaGo Defeated Lee Sedol | Two Minute Papers #53

Google's AlphaGo Zero: The Controversial 'Black Box' AI Mastering the Game of Go&apos...

Google's Deep Mind Explained! - Self Learning A.I.

AlphaGo's AI upgrade gets round the need for human input

AlphaGo Zero: Starting from scratch

AlphaGo Zero