filmov
tv
RSS 2021, Spotlight Talk 11: Safe Reinforcement Learning via Statistical Model Predictive Shielding

Показать описание
**Safe Reinforcement Learning via Statistical Model Predictive Shielding**
Osbert Bastani (University of Pennsylvania); Shuo Li (University of Pennsylvania); Anton Xue (University of Pennsylvania)
**Abstract**
Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety--e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy--it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm, statistical model predictive shielding (SMPS), uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability, and empirically evaluate its performance on several benchmarks.
Osbert Bastani (University of Pennsylvania); Shuo Li (University of Pennsylvania); Anton Xue (University of Pennsylvania)
**Abstract**
Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety--e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy--it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm, statistical model predictive shielding (SMPS), uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability, and empirically evaluate its performance on several benchmarks.