Reinforcement Learning Course - Full Machine Learning Tutorial

preview_player
Показать описание
Reinforcement learning is an area of machine learning that involves taking right action to maximize reward in a particular situation. In this full tutorial course, you will get a solid foundation in reinforcement learning core topics.

The course covers Q learning, SARSA, double Q learning, deep Q learning, and policy gradient methods. These algorithms are employed in a number of environments from the open AI gym, including space invaders, breakout, and others. The deep learning portion uses Tensorflow and PyTorch.

The course begins with more modern algorithms, such as deep q learning and policy gradient methods, and demonstrates the power of reinforcement learning.

Then the course teaches some of the fundamental concepts that power all reinforcement learning algorithms. These are illustrated by coding up some algorithms that predate deep learning, but are still foundational to the cutting edge. These are studied in some of the more traditional environments from the OpenAI gym, like the cart pole problem.

⭐️ Course Contents ⭐️
⌨️ (00:00:00) Intro
⌨️ (00:01:30) Intro to Deep Q Learning
⌨️ (00:08:56) How to Code Deep Q Learning in Tensorflow
⌨️ (00:52:03) Deep Q Learning with Pytorch Part 1: The Q Network
⌨️ (01:06:21) Deep Q Learning with Pytorch part 2: Coding the Agent
⌨️ (01:28:54) Deep Q Learning with Pytorch part
⌨️ (01:46:39) Intro to Policy Gradients 3: Coding the main loop
⌨️ (01:55:01) How to Beat Lunar Lander with Policy Gradients
⌨️ (02:21:32) How to Beat Space Invaders with Policy Gradients
⌨️ (02:34:41) How to Create Your Own Reinforcement Learning Environment Part 1
⌨️ (02:55:39) How to Create Your Own Reinforcement Learning Environment Part 2
⌨️ (03:08:20) Fundamentals of Reinforcement Learning
⌨️ (03:17:09) Markov Decision Processes
⌨️ (03:23:02) The Explore Exploit Dilemma
⌨️ (03:29:19) Reinforcement Learning in the Open AI Gym: SARSA
⌨️ (03:39:56) Reinforcement Learning in the Open AI Gym: Double Q Learning
⌨️ (03:54:07) Conclusion

--

Рекомендации по теме
Комментарии
Автор

Here are some time stamps folks!

Intro 00:00:00
Intro to Deep Q Learning 00:01:30
How to Code Deep Q Learning in Tensorflow 00:08:56
Deep Q Learning with Pytorch Part 1: The Q Network 00:52:03
Deep Q Learning with Pytorch part 2: Coding the Agent 01:06:21
Deep Q Learning with Pytorch part 3: Coding the main loop 01:28:54
Intro to Policy Gradients 01:46:39
How to Beat Lunar Lander with Policy Gradients 01:55:01
How to Beat Space Invaders with Policy Gradients 02:21:32
How to Create Your Own Reinforcement Learning Environment Part 1 02:34:41
How to Create Your Own Reinforcement Learning Environment Part 2 02:55:39
Fundamentals of Reinforcement Learning 03:08:20
Markov Decision Processes 03:17:09
The Explore Exploit Dilemma 03:23:02
Reinforcement Learning in the Open AI Gym: SARSA 03:29:19
Reinforcement Learning in the Open AI Gym: Double Q Learning 03:39:56
Conclusion 03:54:07

MachineLearningwithPhil
Автор

Anytime I fall asleep to anything I watch these videos haunt my youtube. I never intentionally watch this channel. What the flip guys?

real-chipmunk
Автор

Deep Reinforced Sleeping shall
be the title. Please rename it. Woke up to this. 3.5 hours in!

kiransilwal
Автор

Bro I fell asleep watching a different video and woke up this morning to this video playing and I was an hour deep in it 😭

A.h
Автор

This is a great video if you already understand the topic, understand the code and just want a guy saying what he's typing out aloud, kinda explaining bits and pieces here and there.

InturnetHaetMachine
Автор

Extremely well explained. Kudos to the tutor. Simple explanation to workign code in less than an hour is amazing and yet very clearly laid out. Thanks for this upload.

amitbuch
Автор

The nerd talk and keyboard typing is very ASMR and it helps me sleep. Put a mic at the keyboard itself and edit it in, this is wonderful and i might actually wake up smarter in the morning

TBPony
Автор

One minor correction for those watching at 1:19:12 and trying to follow along (like myself): on line 77 after the "else", the "memStart = - batch_size - 1)))" should actually be "memStart = - batch_size - 1)))".



The self.memSize is needed here instead of self.memCntr because at this point the self.memory list is now full (the "else" branch), but the self.memCntr value is continuing to grow and is now larger than the max self.memory size. That leads to line 78 giving miniBatch an empty list, [ ], leading to memory being an empty array, because memStart will be a larger value than the self.memory list length, while then being used as the index for grabbing the miniBatch from that same self.memory list -- no good. Ultimately that leads to an exception: "too many indices for array" on line 81 (since we are trying to forward an empty 1D numpy array and call 2D indices that don't exist). With self.memSize for line 77, that no longer happens, and memStart stays within the bounds of the self.memory length/size. With that, everything works, and you can watch the agent play :)

SmartassEyebrows
Автор

Anyone interested in learning the terminologies of what he is talking about should go check out the video lectures Stanford did on MDPs(Markov decisions processes and RL), they're about each an hour long and do go in depth behind the math for a lot of this stuff. Cheers!!!

aaryasankhe
Автор

The length of the flatten outputlayer can actually be calculated from first conv layer tracing the data through the network. Just use the function:

((dimension length - kernal size for the dimension + 2*padding)/stride)+1 = output length for the dimension


do this for each dimension for each conv layer and multiply by number of outputs in the end to find the length of the flat dimension as such:


1st conv layer: ((185 - 8 + 2*1)/4) + 1 = 44 (acutally 44.75 but you always round down, since there are no 0.75 pixels)
((95 - 8 + 2*1)/ 4) + 1 = 22 (rounded down from 22.25)


2nd conv: ((44 - 4 + 2*0)/2) + 1 = 21
((22 - 4 + 2*0)/2) + 1 = 10


3rd conv: ((21 - 3 + 2*0)/1) + 1 = 19
((10 - 3 + 2*0)/1) + 1 = 8


this means the 3rd layer outputs 128 frames with each having dimensions 19*8 and therefore if you wanted to flatten them into one you would get one dimension with 128*19*8 vectors.
Just neat little trick for those who want it

barbellbender
Автор

man i just fell asleep on youtube and now i’ve been watching this for 2 hours 19 minutes

evanstark
Автор

8 minutes in right now I’m cracking what is this?!

bzknorthstar
Автор

This is the only issue that i often see on any "basic tutorial" videos. There's no explaination on the terminologies during the intros.

dummypg
Автор

This is one of the best free RL videos available. Please make some more.

claude.detchambila
Автор

I'm immersed in this. I read a book with a similar theme, and I was completely immersed. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn

Bill
Автор

This video has helped me find clues that ultimately helped me to understand machine learning. Thanks blue Steve.

petermcfarland
Автор

Heads up: this one isn't for beginners.

sunmustbedestroyed
Автор

I'm a beginner and the Background loop seems more interesting than what he's talking. I hope I understand what he's saying someday

lsagar
Автор

I just woke up 3 hours in . Not a normal listen for me🤔 but the dream I was having 🤯

johnnychapman
Автор

grazie mille ho iniziato da poco a seguire il tuo corso ben fatto

fabrizioantonazzo