Finding Policies Three - Georgia Tech - Machine Learning

preview_player

Показать описание

Рекомендации по теме

Комментарии

If every U(S) at time t is defined in terms of U(S') in time t, then how do you even begin to solve this equation, unless there is a state S where there is no clear next state defined? Is that how this equation is solved? By finding U(S) at the goal state / absorbing states and propagating that across the board?

IndrajitRajtilak

But isn't there still an argmax in the improve step that makes it non-linear? The only difference I see from the last approach is that you don't use an argmax to determine your initial guess. I think I'm missing something.

BrettClimb

visit shbcf.ru