SARSA(λ) on Acrobot-v1 with linear function approximation

Показать описание

The goal for the under-actuated bot is to swing around until it crosses the black line. It receives a negative reward for every step it doesn't reach the line (it's a painful life). We see that it starts off with random actions and quickly learns an effective strategy. On episode 100 it gets stuck swinging around wildly until it's finally figured out how to pull it off by episode 250.

The developments leading up to the SARSA(λ) algorithm involved many inspirations from biology. The temporal difference error is inspired from animal learning while eligibility traces are inspired by the working of neurons. Kind of crazy how a few lines of code can show intelligent behaviour.

Described by Sutton in 1995:
Code written by me:

Shaurya Seth

Рекомендации по теме

SARSA(λ) on Acrobot-v1 with linear function approximation

SARSA(λ) on Acrobot-v1 with linear function approximation

Robot Learns to Self Balance with N Step SARSA | Complete Reinforcement Learning Tutorial

How to Code SARSA with Just Numpy

Advanced AI Deep Reinforcement Learning in Python (Part 6 Deep Q Learning)