Все публикации

309. No, LLMs

309. No, LLMs are not Scheming

308. Frontier Models

308. Frontier Models Are Capable of InContext Scheming

307. OpenAI Email

307. OpenAI Email Archives

306. Superintelligences Will

306. Superintelligences Will Not Spare Earth

305. A Funny

305. A Funny Feature of the AI Doomster Argument

303. How to

303. How to prevent collusion when using untrusted models to monitor each other

302. AGI Safety

302. AGI Safety and Alignment At Google DeepMind

301. CAST Discussion

301. CAST Discussion with Max Harms

300. Corrigibility As

300. Corrigibility As Singular Target 0 and 1

299. Assessing the

299. Assessing the Risk of Takeover Catastrophe from LLMs

297. Against the

297. Against the Singularity Hypothesis AND discussion with David Thorstad

296. LLMs for

296. LLMs for Alignment Research A Safety Priority

295. California Senate

295. California Senate Bill 1047

294. Practices for

294. Practices for governing agentic AI systems

293. pDoom is

293. pDoom is 0.95

290. Counting Arguments

290. Counting Arguments Provide No Evidence For AI Doom

289. Gaining Capabilities

289. Gaining Capabilities by learning the plan effect mapping

288. Credibly Safe

288. Credibly Safe AI

287. Existential Risk

287. Existential Risk Persuasion Tournament

285. Imitation Learning

285. Imitation Learning is Probably Existentially Safe 2 (Sound improved, but not perfect)

285. Imitation Learning

285. Imitation Learning is Probably Existentially Safe 2

284. Imitation Learning

284. Imitation Learning is Probably Existentially Safe 1

283. AI Pause

283. AI Pause Will Likely Backfire 2

282. AI Pause

282. AI Pause Will Likely Backfire 1