Все публикации

309. No, LLMs are not Scheming

308. Frontier Models Are Capable of InContext Scheming

307. OpenAI Email Archives

306. Superintelligences Will Not Spare Earth

305. A Funny Feature of the AI Doomster Argument

303. How to prevent collusion when using untrusted models to monitor each other

302. AGI Safety and Alignment At Google DeepMind

301. CAST Discussion with Max Harms

300. Corrigibility As Singular Target 0 and 1

299. Assessing the Risk of Takeover Catastrophe from LLMs

297. Against the Singularity Hypothesis AND discussion with David Thorstad

296. LLMs for Alignment Research A Safety Priority

295. California Senate Bill 1047

294. Practices for governing agentic AI systems

293. pDoom is 0.95

290. Counting Arguments Provide No Evidence For AI Doom

289. Gaining Capabilities by learning the plan effect mapping

288. Credibly Safe AI

287. Existential Risk Persuasion Tournament

285. Imitation Learning is Probably Existentially Safe 2 (Sound improved, but not perfect)

285. Imitation Learning is Probably Existentially Safe 2

284. Imitation Learning is Probably Existentially Safe 1

283. AI Pause Will Likely Backfire 2

282. AI Pause Will Likely Backfire 1