Все публикации

AI Ruined My Year

Apply to Study AI Safety Now! #shorts

Why Does AI Lie, and What Can We Do About It?

Apply Now for a Paid Residency on Interpretability #short

$100,000 for Tasks Where Bigger AIs Do Worse Than Smaller Ones #short

Free ML Bootcamp for Alignment #shorts

Win $50k for Solving a Single AI Problem? #Shorts

Apply to AI Safety Camp! #shorts

We Were Right! Real Inner Misalignment

Intro to AI Safety, Remastered

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Quantilizers: AI That Doesn't Try Too Hard

10 Reasons to Ignore AI Safety

9 Examples of Specification Gaming

Training AI Without Writing A Reward Function, with Reward Modelling

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Is AI Safety a Pascal's Mugging?

A Response to Steven Pinker on AI

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Friend or Foe? AI Safety Gridworlds extra bit

AI Safety Gridworlds

Experts' Predictions about the Future of AI

Why Would AI Want to do Bad Things? Instrumental Convergence