Все публикации

AI Ruined My

AI Ruined My Year

Apply to Study

Apply to Study AI Safety Now! #shorts

Why Does AI

Why Does AI Lie, and What Can We Do About It?

Apply Now for

Apply Now for a Paid Residency on Interpretability #short

$100,000 for Tasks

$100,000 for Tasks Where Bigger AIs Do Worse Than Smaller Ones #short

Free ML Bootcamp

Free ML Bootcamp for Alignment #shorts

Win $50k for

Win $50k for Solving a Single AI Problem? #Shorts

Apply to AI

Apply to AI Safety Camp! #shorts

We Were Right!

We Were Right! Real Inner Misalignment

Intro to AI

Intro to AI Safety, Remastered

Deceptive Misaligned Mesa-Optimisers?

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

The OTHER AI

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Quantilizers: AI That

Quantilizers: AI That Doesn't Try Too Hard

10 Reasons to

10 Reasons to Ignore AI Safety

9 Examples of

9 Examples of Specification Gaming

Training AI Without

Training AI Without Writing A Reward Function, with Reward Modelling

AI That Doesn't

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Is AI Safety

Is AI Safety a Pascal's Mugging?

A Response to

A Response to Steven Pinker on AI

How to Keep

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Friend or Foe?

Friend or Foe? AI Safety Gridworlds extra bit

AI Safety Gridworlds

AI Safety Gridworlds

Experts' Predictions about

Experts' Predictions about the Future of AI

Why Would AI

Why Would AI Want to do Bad Things? Instrumental Convergence

welcome to shbcf.ru