Value Iteration Algorithm - Dynamic Programming Algorithms in Python (Part 9)

Показать описание

In this video, we show how to code value iteration algorithm in Python.

This video series is a Dynamic Programming Algorithms tutorial for beginners. It includes several dynamic programming problems and shows how to code them in Python. It is also a great Python tutorial for beginners to learn algorithms.

❔Answers to
↪︎ What is Markov decision process (MDP)?
↪︎ How can I solve Markov decision process problem by using value iteration algorithm?
↪︎ How can I code value iteration algorithm?

⌚ Content:
↪︎ 0:00 - Intro
↪︎ 0:24 - Explanation of Markov decision process
↪︎ 1:09 - Value iteration algorithm to solve an MDP problem
↪︎ 2:16 - How to code value iteration algorithm
↪︎ 4:17 - An example of MDP: Gambler's problem

🌎 Follow Coding Perspective:

🎥 Video series:

Рекомендации по теме

Комментарии

Correction: There are typos in the 'P' function between 5:23 - 5:51. It should be as follows.

def P(s_next, s, a):
if s + a == s_next and a <= min(s, N-s) and 0 < s < N and a >= 1:
return p
elif s - a == s_next and a <= min(s, N-s) and 0 < s < N and a >= 1:
return 1 - p
else:
return 0

CodingPerspective

I really want to thank you for making me see the forest through the trees. I was reading Artificial Intelligence: Modern Approach, and it was hard to grasp the concepts. Thank you so much.

wilianc.b.

I still need to watch a second time to understand everything, but it was really cool to find your video about it!

viniciusfriasaleite

Dostum anlatım çok güzel, aksanından Türk olduğunu düşüyorum. Eline sağlık.

ahmetcanaytekin

I was thinking why we don't use minimum function instead of the sum function in the definition of P. When I implemented the code there was a syntax error in the sum part as well.

dalmatoth-lakits

what if i have data like numerical data where the data contains 24 features then how to get the states and actions,
in some blogs they have said that you need to take some features as actions and some features as states, how to implement an mdp using those

sarankoundinya

When I set p(probability of winning) to 1.0, optimal policy is 0 for every state. I solved it by changing the reward of winning to 10 and -1 instead of 1 and 0. V array is correct and policy array. Is this a correct way to solve this problem or is there something I'm missing. Thanks

Edit: When I set the parameters N=10 and p=0.4 to the code with the changed reward function(10, -1) optimal policy is not in the domain(ex. for state s=2, optimal policy is 3).
How do I solve this?

milepivo

Where is Gamma in the coded formula? and the probability is multiplied by the reward + gamma * old_V

Mmdalaj

Hello, i get this error "TypeError: unhashable type: 'list'" on the line "V[S] = max(Q.values())"
I didn't find a solution, does someone know how t osolve this ?

corentinleger

bir de türkçesini hazırlasan daha iyi olmaz mı?

yokdilyds

Value Iteration Algorithm - Dynamic Programming Algorithms in Python (Part 9)

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Solve Markov Decision Processes with the Value Iteration Algorithm - Computerphile

Value Iteration Algorithm - Dynamic Programming Algorithms in Python (Part 9)

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

RL 6: Policy iteration and value iteration - Reinforcement learning

How to use Bellman Equation Reinforcement Learning | Bellman Equation Machine Learning Mahesh Huddar

Bellman Equation - Explained!

4 Steps to Solve Any Dynamic Programming (DP) Problem

Markov Decision Process (MDP) - 5 Minutes with Cyrill

Value Iteration in Deep Reinforcement Learning

27. Value Iteration || End to End AI Tutorial

Value Iteration Visualization.

2.02 Dynamic Programming: Value Iteration

Dynamic Programming Tutorial for Reinforcement Learning

Value Iteration and Q-Learning Reinforcement Learning Algorithms

Optimal Policies and Value Iteration

Value Iteration (tutorial)

Value Iteration and Policy Iteration - Model Based Reinforcement Learning Method - Machine Learning

RTDP | Real Time Dynamic Programming

Value Iteration in POMDPs - 1

Value Iteration

Reinforcement Learning: Value Iteration

Dynamic Programming lecture #1 - Fibonacci, iteration vs recursion

2110593 Reinforcement Learning L 2 - MDP, Policy Iteration, Value iteration, Dynamic Programming