AI Learns to Escape (deep reinforcement learning)

preview_player
Показать описание
AI Teaches Itself How to Escape!

In this video an AI Warehouse agent named Albert learns how to escape 5 rooms I've designed. The AI was trained using Deep Reinforcement Learning, a method of Machine Learning which involves rewarding the agent for doing something correctly, and punishing it for doing anything incorrectly. Albert's actions are controlled by a Neural Network that's updated after each attempt in order to try to give Albert more rewards and less punishments over time.

Everything in this video (except for the music) was created entirely by myself using Unity. Check the pinned comment for more information on how the AI was trained!

Current Subscribers: 0
Рекомендации по теме
Комментарии
Автор

This 8 minute video took over 100 hours to make! Everything in the video (except for the music) was created entirely by me using Unity, so please, like and subscribe!:D



Now, back to Albert:

Time it Took to Train:
Room 1: 10 minutes
Room 2: 20 minutes
Room 3: 29 minutes
Room 4: 48 minutes
Room 5: 5 hours 42 minutes

Total Training Time: 7 hours 29 minutes

*NOTE* You only see one Albert in the video, but there are actually around 50-100 copies of Albert and the room he's in behind the camera training simultaneously. This makes it so instead of me needing to go through 500 hours of footage to edit the video, I only need to go through 7.

Albert was trained using reinforcement learning, meaning he was rewarded for doing things correctly (like hitting a pressure plate), and punished for doing them incorrectly (like falling off the platform or hitting a wall/obstacle). After Albert finishes each attempt, the actions he took are analyzed and the weights in the neural network (Albert's brain) are adjusted using PPO (proximal policy optimization) to prioritize the actions that lead to a positive outcome, and to avoid the actions that lead to a negative outcome.

All of Albert's inputs is his 'vision', which comes from raycasts. There are a total of 21 raycasts, 7 looking down, 7 looking straight ahead and 7 above his head, all with a maximum FOV of 70 degrees to try to mimic our own vision. Each of these raycasts is responsible for 2 inputs for Albert's neural network; the distance to an object (if any), and the type of object it is (pressure plate, obstacle, ground). I also stacked Albert's vision 6 times so he can have some sort of short term memory, he can find a pressure plate in the room then take actions to get towards it even if it's no longer visible (for a little bit).


In the first room Albert starts off randomly making moves until he accidentally hits the pressure plate to open the door, giving him a reward. This reward made the neural network controlling his actions update to try to replicate that outcome, and this continued with each pressure plate until Albert opened the door and was able to walk (on an invisible pressure plate) in the next room. Once Albert got into the next room, the same process was repeated, continuing with the same neural network that let Albert escape the previous room.

For people who think I faked this:
It would probably take me longer to fake this than it would to just do it for real, I used Unity's ML-Agents toolkit to make it easy, though I have experimented before with doing everything from scratch (poorly). The reason this AI is a lot smoother than others you've probably seen is because I only allow the AI to make a decision every 10 academy steps (game ticks), so when it starts to turn for example, it's forced to keep turning that way for 10 ticks. I did this because I don't like how jittery the AI looks when you let it make decisions every tick. Also, you only see one AI in the video, but behind the camera there are roughly 50-100 copies of Albert and the room he's in that train simultaneously to speed up the training process. The reason I opted for this over having them all train in the same room is because I wanted to be able to follow a single character for the sake of the video. Albert uses the same neural network the entire time, all you need to do is add "--resume" to the end of the command when training in Unity to have it continue using the same brain. Unity's ML-Agents makes AI quite easy!

Thank you so much for watching! These short videos take literally hundreds of hours to make, if you want to help allow us to make them faster, please consider becoming a channel member! By becoming a member, your name can be in future videos, you can see behind-the-scenes things that don’t fit in the regular videos, you can also use stickers of Albert, Kai and some other characters our team made in comments (more coming) :D

If you have other ideas for what I should make Albert learn how to do, please let me know!:D

aiwarehouse
Автор

Sometimes he simply looks like he's celebrating without knowing what else he needs to do and I love it

anactualfork
Автор

Albert just casually jumping on the same spot after succeeding just shows how natural this is. Made me happy for him

aladdinde
Автор

I like how in Room 3, as soon as Albert figured out he needed to jump two times to clear the room, he decided that jumping as much as possible was the best strategy for everything.

mirandatagliamonte
Автор

Albert confidently jumping off the platform to what I can only assume is the sweet release of death really speaks to me.

giggen
Автор

The fact that he would do 180s and 360s whenever possible makes me happy

divadwangwang
Автор

I love how Albert learned little mannerisms throughout the video, like doing 360 spin jumps or his little shimmy before jumping on platforms.

musical_trash_your_inform
Автор

I love when he jumps off a platform, he spins like he's doing a 360 noscope, I know he's probably just trying to turn midair so he can keep going straight when he lands without wasting time to turn or just trying to check the entire room while he's high up but I just find it so humanizing lmao

MonkeyGng
Автор

This seemed like a beautiful mix of Stanley Parable and Portal.
The narrator is trying to help Albert to do what he's intended for, but gets mad when he can't. In the end, the narrator reveals that he has more plans for Albert than just a "cake".
Incredible work. See you soon, Albert!

rafaelmikoskirosa
Автор

At 4:50, Albert taught us a valuable lesson in problem solving. Thanks Albert!

name
Автор

I love how occasionally Albert will just start jumping and doing 360s in place when he does something good, I know it’s just a quirk of the AI but it feels like he is celebrating his victory

The_Letter_Q
Автор

your editing and design are *fantastic*. every little joke and characterisation has me smiling, laughing and cheering for albert. i am feeling lots of good emotions about a simulated cube, and it's all thanks to the exit arrows, the lil eyes, the music cuts..

turingtestingmypatience
Автор

7:45 Can we appreciate that Albert learned to use the kind of setup you’d see in a speedrun to line up that jump? Jumping twice while spinning and then jumping backwards is a lot less to keep track of than manually lining it up.

JamUsagi
Автор

I love how the ai develops some "superstitions" such as that shimmy before the first jump on the final room, or the 2 standing jumps before going for the last pillar

akakda
Автор

Those little victory jumps after pulling off a difficult leap, mid-air spins and even the way he "gets fed up" and jumps off the ledge are hilarious. My sides hurt but my day has been made. Thank you.

TheDanielCobra
Автор

4:09 Albert: FINALY
Time: *out*
Albert: *SHI-*

arcaderab
Автор

I was surprised that I felt genuinely bad for him at the end. When the floor was getting smaller I actually was like 'oh no, he must be so scared' in my head lol.

djkhemix
Автор

Best ai video I’ve watched in a while can’t wait to see albert jumping out of a plane to try to hit a button

stupididiot
Автор

I love how in Room 4 Albert would sometimes pace back and forth to prepare himself for the jump

yqmmviu
Автор

i love how at 4:48 he gets so angry that he decides he's had enough, and jumps off

breylonly