Reinforcement Learning - My Algorithm vs State of the Art

Показать описание

In this 3rd video about inverted pendulum balancing with AI, I compare the results I had with my own algorithm with a state of the art reinforcement learning algorithm using Isaac Lab, which is part of the NVidia omniverse platform.

PC Specs:
i7 -12700K
RTX 4090
32GB memory

Рекомендации по теме

Комментарии

I think you would be interested in network pruning. This is something that's typically done periodically during training to thin networks. If you examine the weights in your PPO-optimized network, you'll find that many are very small, while others are larger. If some near-zero weights are set to zero, networks will often become more stable after fine-tuning. You'll find that the connections in the network begin to look sparse and very similar to networks generated via. Evolutionary methods. PPO is just an optimizer and will work with whatever network configuration you want. The evolutionary networks shown in the video are all differentiable, so PPO would be able to optimize. That would be an interesting comparison if you'd want to pursue that!

chris-graham

Thank you so much for this demonstration and adding the links! I didn't know of Isaac Lab and was wondering how it was possible to control the mechanics. Great video!

fluffsquirrel

The quality and education of these videos is unmatched please keeping making stuff like this!

imanuelbaca

getting that sort of aid from NVIDIA is super nice. super cool, my school just got an ai accelerator, " AGX Orin" very cool piece of computing and fantastic of AI training and research. also, as someone who is more hardware orientated, it has a super fascinating architecture(shared cpu and gpu global memory!)

Waffle_

I've been looking for a subject for my engineering degree and this video might be exactly it! Thank you for the inspiration, your videos are always a blast!

kubstoff

This is just brilliant. I verbally gasped at those numbers. I am so grateful to be living in a world with this sort of stuff, it's truly amazing!

briandeanullery

Have you considered adding physical parameters from motor torque and motor weight? This would help you get much more realistic sim and difficulty level. Also, realistic response times (based on inference speed + connection latency). Also, you can either have a motor at the base and one at the middle joint or both at the base.

You may also consider adding a battery's weight, so you have the voltage required to power those two motors for some period (say 5 min). This will be an awesome challenge and help you connect simulation to reality much more closely, which sounds super exciting. Looking forward to see if you end up working on it!

hcy

It's amazing to see it temorarily give up on balancing when it gets too close to the edge of the rail, so it can try again later in a more favorable position

waity

We got Pezzza's work X Nvidea collab before GTA VI 😭

max_me_is

17:06 That's so similar to what the timescales of evolution in nature, and a human learning a skill are like. That's kinda crazy. Really makes it look like the algorithms successfully mimic real counterparts.

_nemo

PPO and gradient-based policy learning in general is amazing. I will still say that your struggle to get an evolutionary algorithm to learn this problem led to some really creative and impressive curriculum learning ideas which also apply to PPO :)

poketopa

Do note that Evolutionary algorithms are usually better than pure RL agents for problems with very sparse rewards (Which is not the case here). For these problems, a hybrid approach might work best.

sutsuj

Oh my god, a video from Pezzza!! I'm so excited!!

PatrickHoodDaniel

Thanks for this high quality video and comparison of those algorithms, very nice. Keep it up

requestfx

We've come along way in simulation technology

MicahBratt

I'm solving similiar task: I'm trying to learn AI car to drive, with realistic physics. And I was struggling with learning as you do in previous video, I was inspired by your solution and tried another approach: I started from simple physics (no inertia, no wheels, just rotations + offsets), then gradually interpolated between this simple physics and hard physics. And my NN was able to learn how to drive perfectly. But then I tried energy-based model, basically it's an NN that receives current state, desired action and outputs just a single number - energy. You need to find best action that outputs minimum energy. I iterated over 9 possible actions, and that NN was able to learn how to drive in complex physics without any hacks and very fast.

So, what do I think: first try CMA-ES, as a superior zero-order optimization method. I think that NEAT is a trash, and one day I will test it out. Then you should try energy-based model. Then it will be someway fair comparison. Now it's not fair absolutely, and I slightly disappointed with this video.

optozorax

your channel is growing awesomely and it s quite deserved i love your work thank you.
Very nice that Nvidia sponsored this video too!

VivienLEGER

Now you have to add flex to the materials, a small gap to the rollers and the beam. Then add a slack in the bearings...

nexttonic

Another great video! I'll try a bit of Isaac Sim too... seems pretty cool to play it.

jairoandre-

Instant like and sub. I could watch these all day. Great work!

kiaranr

Reinforcement Learning - My Algorithm vs State of the Art

Reinforcement Learning - My Algorithm vs State of the Art

Reinforcement Learning from scratch

The Magic of Reinforcement Learning with Human Feedback RLHF

Reinforcement Learning Live Example With My Baby 👶👶👶

Reinforcement Learning from Human Feedback (RLHF) Explained

Does your PPO agent fail to learn?

Solving a Maze with Reinforcement Learning

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

DAY 12 | ARTIFICIAL INTELLIGENCE AND APPLICATIONS | VI SEM | BCA | LEARNING MACHINE LEARNING | L2

Reinforcement Learning, by the Book

Actor Critic Algorithms

Not enough data for deep learning? Try this with your #Python code #shorts

How to Choose an Appropriate Deep RL Algorithm for Your Problem

Reinforcement learning for 10 years old | reinforcement learning python | reinforcement learning ML

Reinforcement Learning: on-policy vs off-policy algorithms

Reinforcement learning with my Eat Melon Demo!

What is Reinforcement Learning? #shorts

Training an unbeatable AI in Trackmania

Andrew Ng's Secret to Mastering Machine Learning - Part 1 #shorts

The Full Reinforcement Learning Iceberg

Reinforcement Learning Policies and Learning Algorithms

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Reinforcement Learning: Essential Concepts

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)