Все публикации

Deliberative Alignment: Reasoning Enables Safer Language Models

Alignment Faking in Large Language Models

RE-Bench: measuring AI agents at AI R&D vs human experts

NeurIPS 2024 Poster - On scalable oversight

NeurIPS 2024 Poster - No 'Zero-Shot' Without Exponential Data

Still a long way to go for Computer Vision? The GRAB Benchmark

Gemini 1.5 Pro has a massive context window

Challenges with unsupervised LLM knowledge discovery

Anthropic - AI sleeper agents?

Mamba - a replacement for Transformers?

How does Gemini compare to GPT-4?

Self-supervised vision

Vision Transformer Basics

Is Chain of Thought faithful?

How strong is Claude 2?

What does AI believe is true?

Can we verify training data?

What is Superalignment?

What is SDXL 0.9?

Eliciting Latent Knowledge

What is KOSMOS-2?

Possible catastrophic AI risks?

Textbooks Are All You Need

What is Gaussian Elimination?