AI Learns to Use Stairs (deep reinforcement learning)

preview_player
Показать описание
AI Teaches itself to Climb Up and Down Stairs!

If you want to learn more about AI and deep reinforcement learning (how Albert is trained), there are amazing courses teaching those exact concepts on Brilliant! You can use my link to get a free 30 day trial with 20% off! I've personally gone through the course "Introduction to Neural Networks", and it's one of the best courses on Neural Networks I've ever seen. They're paying us to promote them, but they're genuinely a great service, I've had a Brilliant account for over 5 years and can't recommend it enough :)

In this video an AI Warehouse agent named Albert learns how to walk up and down stairs and get through other obstacles to escape. The AI was trained using Deep Reinforcement Learning, a method of Machine Learning which involves rewarding the agent for doing something correctly, and punishing it for doing anything incorrectly. Albert's actions are controlled by a Neural Network that's updated after each attempt in order to try to give Albert more rewards and less punishments over time. Check the pinned comment for more information on how the AI was trained!

Current Subscribers: 305,109
Рекомендации по теме
Комментарии
Автор

Everyone seemed to really love the video of Albert learning to walk, so I thought I would make a video improving that walk! In this video Albert starts with the brain he developed in the walking video, and he uses that knowledge to learn how to climb up and down stairs, walk on uneven terrain, drop off ledges and even climb up a backwards escalator! Albert is getting better and better:D

If you’re interested in learning more about exactly how Albert works, I’ll do my best to explain it here in a simple (but correct) way, but this explanation is nothing compared to what you would learn going through the “Introduction to Neural Networks” course on Brilliant. Despite them sponsoring the video, they’re not paying me to write this comment, I’m writing this because the course is genuinely the best introduction to neural networks (like Albert) I’ve seen, and access to all their courses is completely free for 30-days if you use my link Brilliant.org/AIWarehouse to sign up for a free trial. The 30-day free trial would give you plenty of time to get through the relevant courses “Introduction to Neural Networks”, “Reinforcement Learning” and “Artificial Neural Networks” completely for free:) It's also a great way to support the channel, as you know it takes months to make these videos, and just starting a free trial using Brilliant.org/AIWarehouse would help me out so much! Anyway, onto how Albert works!


NOTE: It may look like there’s only one Albert training, but there are actually an additional 280 Alberts training simultaneously behind the camera to speed up the training time (and also reduce the amount of footage we need to sift through).


HOW THE BRAIN WORKS
I’ve explained this in previous videos, but Albert is controlled by an artificial brain called a neural network. His brain has 5 layers, the first layer consists of the inputs (the information Albert is given before taking action, like his limb positions and velocities), the last layer tells him what actions to take and the middle 3 layers, called hidden layers, are where the calculations are performed to convert the inputs into actions.

Albert starts off using his brain we trained in the walking video, and we use reinforcement learning to further train his brain for this new task of climbing stairs. The only things that matter when it comes to deep reinforcement learning (reinforcement learning with a multi-layered neural network) are figuring out what information to give the agent (the observations/inputs), and when and how much to reward and punish it (the reward function). Almost all of Albert’s inputs are exactly the same as the inputs we gave him for the walking video (like the position, rotation, velocity, angular velocity, strength and contacts for each limb, plus some sensor observations), in this video however, we also have a sensor sticking out in front of each foot which tells Albert the distance between his feet and the stair in front of him, if there is one. These sensors are the only way Albert can ‘see’, and they only detect stairs, so Albert is essentially blind!

Just like in the last videos, Albert was trained using reinforcement learning. For each of Albert's attempts, we calculate a score for how 'good' he performed and use an algorithm called proximal policy optimization (PPO) to make small, calculated adjustments to his brain to try to encourage the behaviors that led to a higher score and avoid those that led to a lower score. You can think of increasing Albert’s score as rewarding him and decreasing his score as punishing him, or you can think about it like natural selection where the best-performing Alberts are most likely to reproduce, so the genes that lead to better behavior will get more prevalent with each generation. As long as the reward function (how we reward/punish Albert) is set up robustly, these small adjustments to Albert’s brain over (sometimes) hundreds of millions of attempts leads to Albert getting very good at completing the goal defined by the reward function (climbing up and down stairs).


REWARD FUNCTION
Climbing up and down stairs is very comparable to walking so most of the heavy lifting for the reward function was done in the video where Albert learns to walk. In that video, I had to deduce exactly how and when to reward/punish Albert to get him to take steps, so for this video, I used that same reward function to encourage Albert to continue taking steps, just with one additional punishment; punishing Albert when he stands too still. I’ll get into why I had to add this reward in a couple of paragraphs. The reason we have levels for Albert that increase in difficulty is because if we were to throw Albert onto the escalator with only his knowledge of walking, he would have no hope of coming even close to getting through the course since he would need to learn many very complex motions to make even a little bit of progress.


The first level is as simple as we can make it; just a few single steps. These steps are designed specifically to make Albert trip, that way he’ll eventually learn to associate detecting the step with a new environment that needs a different behavior. Albert (and reinforcement learning agents in general) tend to learn to do whatever is easiest while still improving their scores. In this case, Albert originally realized it’s easy to just stop walking and stand still when he detects a stair, and when he does that he no longer trips and falls over the stairs, so just standing still is the easiest way for him to get a higher score. That’s why we added the additional punishment for standing still, we pushed him so he stops getting stuck in that local maximum reward. After doing that he was much more willing to try to step over the stairs and sure enough, after a bit of time, was able to do it consistently.


OBSERVATIONS
As mentioned above, we give Albert the same information we gave him in the walking video (limb positions, rotations, velocities etc.), except this time we give him 2 additional observations in the form of sensors coming out of his feet to detect stairs in front of him. Giving him these 2 new observations means I had to retrain Albert’s walking from scratch with 2 additional null inputs, then continue the training on this new brain, since otherwise there wouldn’t be any room on the input layer to give him the sensor data. Thankfully, since the walking behavior had a very carefully modeled reward function and I didn't need to record the training, it didn't take long for my computer to train it again from scratch (about 45 minutes). Adding new observations to this stair-climbing brain would be much more laborious since I would need to train each room at a time. While the walking AI didn’t actually need all the rooms to train properly, this one did.


FINAL CHALLENGE
The final room was very hard for Albert and unfortunately I ran into a big issue I didn't anticipate. I realized too late that since Albert is only given the distance to the ground under his feet and the distance from his feet to the stairs in front of him (if there are any), he isn't able to differentiate normal stairs and the escalator, but the motion required is very different, so it makes it impossible for him to only use one brain. I didn't plan on making him go up an escalator, that was an idea I had later on in the development, and because of that, in the final run he actually uses 2 different brains, he uses his main stair brain until he hits the escalator, then uses the escalator brain, then when he reaches the top of the escalator, I swapped it back to his main brain. There is definitely a way to have one brain work for the entirety of the final room, all that's needed is to give him an extra observation of whether or not the stair in front of him is moving, that would have trained the escalator behaviour into the main brain, unfortunately I just didn't have time to re-train everything to make this work, since it already took 3 months to make the video lol.



Thanks so much for reading and watching!

aiwarehouse
Автор

I cannot wait til we get to the point where Albert is put in an open world and is given a list of quests and adventures

wackjallace
Автор

Us: Robots won't take over because we haven't given them a reason to
Also us: *Alberts in a lot of pain right now*

Phroghetti
Автор

I'm fascinated by the learned helplessness that Albert displays. In the previous video, when Albert fell, he'd start flailing around, trying to get back up. But now, he just freezes up entirely, because, after 1000s of runs, he's learned that there's nothing he can do to recover. At this point, even if you altered how his joints work to give him a way to stand back up, I doubt he'd be able to figure it out. He just wouldn't try.

beepboprobotsnot
Автор

Fun fact, the binary code graffiti on the wall of the second stage spells out "help"
Very humane AI living conditions, I see

dr.criston
Автор

I'm so invested in Albert's development


Also i like the idea of a person just casually doing a backflip to not fall on the stairs

WerstoftheWorst
Автор

The fact "help" is written on the wall by the stairs is hysterical to me. Someone is sending Albert a message!

daxitron
Автор

I'm really impressed with the fact that I could *see* the different ways he was walking. First, on the uneven terrain, I noticed his steps getting much more forceful. Then, on the escalator, his steps got a lot faster! It shows how he can tell what kind of terrain he's working with, and how to best tackle each of then.

FelixEvers
Автор

Can we get a "Albert learns to backflip"?
That is clearly his dream, we would love to see it as (virtual) reality.

Franwow
Автор

"If you beat everything, I'll let you out" DUDE YOU’RE GLADOS

vixboi
Автор

The ad integration in these videos is next level. It's relevant to the theme of the channel, and a lot of the viewers must want to know how to do something as cool as this themselves.

annojance
Автор

7:21 i love how excalator is going up when its going down

stickmandosedoanimation
Автор

I’m a simple man. I see AI warehouse posted a new video of Albert learning, I click like.

TangoCharlieWhiskey
Автор

This man did what no one else could

He made a sponsored segment actually enjoyable to sit through

blazryvlogs
Автор

livestreaming this would be really cool! just 24/7 streams with albert learning skills like jumping over gaps/obstacles, actually getting up if fallen, running, crawling, even picking up things and eventually evolving to making tasks in a little world and questing like someone said.

Znetsixe
Автор

I’m so proud of Albert!
Figuratively, he’s gone very far. But in a much more literal sense he’s still that same little back-flipping orange box with vacant eyes. :)

HelloTurbo
Автор

I love how Albert discovered his love for backflips in the process... also this is the best AD integration I seen for my twelve years of being addicted to youtube

noatmealcookie
Автор

That could be the most well-put together sponsor section i have ever witnessed. I was engaged the whole time. Well done

havenkeeper
Автор

We should be more supportive of this fella (Albert) he’s been through a lot and learned so much.

cheesecake
Автор

Is no one going to talk about the simplicity and beauty of that sponsor placement? From the way it starts with the map of Albert's brain to make you think "What am I looking at?" right into the placement (with the lesson examples) of "You can learn this and more with Brilliant. Here's my code." Well done

heartisles