Can an AI Escape 5 Rooms of Increasing Difficulty?

preview_player
Показать описание
AI teaches itself how escape 5 rooms of progressively increasing difficulty.

In this video, an AI named Zoe is locked in a room with the only exit being a locked door whose key is hidden somewhere within the room. The AI's task is to navigate the room and search for the key, using a combination of reinforcement learning and genetic algorithm. The AI is able to learn and improve its decision-making abilities in order to successfully escape the confines of the five rooms of increasing difficulty and achieve its goal of freedom.

---
Contents:
0:00 - Intro
0:30 - Room 1
1:28 - Room 2
2:41 - How Zoe is trained
3:21 - Room 3
4:47 - Room 4
6:43 - Room 5
8:15 - Zoe messes up

----
Follow me on twitter for updates,

----
🧠 Technical Details
- Machine Specs: Intel i5 8th gen with 16GB RAM and integrated GPU

----
🎵 Music
All music used in this video are from an artist named "Lukerembo"
Рекомендации по теме
Комментарии
Автор

This deserves more views, it's really well put together.

calebernst
Автор

Everything you see in the video was built from scratch using the Rust (Macroquad)

Below is the time it took to train this AI,
Room 1: About 3 seconds (35 steps limit)
Room 2: About 40 seconds (200 steps limit)
Room 3: About 2 minute (300 steps limit)
Room 4: About 1.5 minute (300 steps limit)
Room 5: About 5-8 minutes (1500 steps limit)

The above simulation was run mostly at 5x speed, and at 1x speed when recording.
2000 instances of the AI were in parallel in the background (this is what helps achieve the results so quickly), 1 instance is rendered to the screen.
I used Rust programming language for all of the computing and rendering. Bevy Engine and Macroquad engines were chosen as the game engine.

Under the hood Zoe uses a combination of Genetic algorithm and some ideas from reinforcement learning to solve a particular room. Zoe is able to learn from the mistakes he made in the previous generations and slowly correct himself in every single generation.

Here's a simple step by step process of how that works,
- At the beginning of the simulation 2000 Zoe instances are created (with only one displayed on the screen)
- Zoe is allowed to make random movement (in any of the 4 directions). This either results in zoe hitting a wall, hitting a spike/bug (death) or moving to an empty space. Each of these actions are then evaluated for their effectiveness, i.e if Zoe is walking away from the key, his fitness is reduced (fitness is a measure of how well an AI agent performed in a given generation), else his fitness value increases.
- At the end of a generation (when the steps limit of a room completes), the current population (of 2000 Zoe's) are sorted according to their fitness (with the best fitness run being displayed on the screen)
- This process continues until Zoe can find the key and successfully navigate to the door, at which point the simulation objective is reached.
- Zoe still makes a lot of unnecessary movements to get the key/door i.e the path he took to the key/door isn't the shortest. Hence the number of steps it took for Zoe to get to key/door are also taken into consideration into the fitness calculation (lower the number of steps, higher is the fitness)

Let me know what you think about the video, if you have any ideas or questions, feel free to leave me a comment below :)

bonesai-dev
Автор

Would be cooler if anything carried over level to level tho. It's not really training an AI as much as it's deploying a new AI to solve each puzzle

evamotto