Value Iteration Algorithm - Dynamic Programming Algorithms in Python (Part 9)

preview_player
Показать описание
In this video, we show how to code value iteration algorithm in Python.

This video series is a Dynamic Programming Algorithms tutorial for beginners. It includes several dynamic programming problems and shows how to code them in Python. It is also a great Python tutorial for beginners to learn algorithms.

❔Answers to
↪︎ What is Markov decision process (MDP)?
↪︎ How can I solve Markov decision process problem by using value iteration algorithm?
↪︎ How can I code value iteration algorithm?

⌚ Content:
↪︎ 0:00 - Intro
↪︎ 0:24 - Explanation of Markov decision process
↪︎ 1:09 - Value iteration algorithm to solve an MDP problem
↪︎ 2:16 - How to code value iteration algorithm
↪︎ 4:17 - An example of MDP: Gambler's problem

🌎 Follow Coding Perspective:

🎥 Video series:
Рекомендации по теме
Комментарии
Автор

Correction: There are typos in the 'P' function between 5:23 - 5:51. It should be as follows.

def P(s_next, s, a):
if s + a == s_next and a <= min(s, N-s) and 0 < s < N and a >= 1:
return p
elif s - a == s_next and a <= min(s, N-s) and 0 < s < N and a >= 1:
return 1 - p
else:
return 0

CodingPerspective
Автор

I really want to thank you for making me see the forest through the trees. I was reading Artificial Intelligence: Modern Approach, and it was hard to grasp the concepts. Thank you so much.

wilianc.b.
Автор

I still need to watch a second time to understand everything, but it was really cool to find your video about it!

viniciusfriasaleite
Автор

Dostum anlatım çok güzel, aksanından Türk olduğunu düşüyorum. Eline sağlık.

ahmetcanaytekin
Автор

I was thinking why we don't use minimum function instead of the sum function in the definition of P. When I implemented the code there was a syntax error in the sum part as well.

dalmatoth-lakits
Автор

what if i have data like numerical data where the data contains 24 features then how to get the states and actions,
in some blogs they have said that you need to take some features as actions and some features as states, how to implement an mdp using those

sarankoundinya
Автор

When I set p(probability of winning) to 1.0, optimal policy is 0 for every state. I solved it by changing the reward of winning to 10 and -1 instead of 1 and 0. V array is correct and policy array. Is this a correct way to solve this problem or is there something I'm missing. Thanks


Edit: When I set the parameters N=10 and p=0.4 to the code with the changed reward function(10, -1) optimal policy is not in the domain(ex. for state s=2, optimal policy is 3).
How do I solve this?

milepivo
Автор

Where is Gamma in the coded formula? and the probability is multiplied by the reward + gamma * old_V

Mmdalaj
Автор

Hello, i get this error "TypeError: unhashable type: 'list'" on the line "V[S] = max(Q.values())"
I didn't find a solution, does someone know how t osolve this ?

corentinleger
Автор

bir de türkçesini hazırlasan daha iyi olmaz mı?

yokdilyds
welcome to shbcf.ru