SARSA(λ) on Acrobot-v1 with linear function approximation

preview_player
Показать описание
The goal for the under-actuated bot is to swing around until it crosses the black line. It receives a negative reward for every step it doesn't reach the line (it's a painful life). We see that it starts off with random actions and quickly learns an effective strategy. On episode 100 it gets stuck swinging around wildly until it's finally figured out how to pull it off by episode 250.

The developments leading up to the SARSA(λ) algorithm involved many inspirations from biology. The temporal difference error is inspired from animal learning while eligibility traces are inspired by the working of neurons. Kind of crazy how a few lines of code can show intelligent behaviour.

Described by Sutton in 1995:
Code written by me:
Рекомендации по теме