Все публикации

Deliberative Alignment: Reasoning

Deliberative Alignment: Reasoning Enables Safer Language Models

Alignment Faking in

Alignment Faking in Large Language Models

RE-Bench: measuring AI

RE-Bench: measuring AI agents at AI R&D vs human experts

NeurIPS 2024 Poster

NeurIPS 2024 Poster - On scalable oversight

NeurIPS 2024 Poster

NeurIPS 2024 Poster - No 'Zero-Shot' Without Exponential Data

Still a long

Still a long way to go for Computer Vision? The GRAB Benchmark

Gemini 1.5 Pro

Gemini 1.5 Pro has a massive context window

Challenges with unsupervised

Challenges with unsupervised LLM knowledge discovery

Anthropic - AI

Anthropic - AI sleeper agents?

Mamba - a

Mamba - a replacement for Transformers?

How does Gemini

How does Gemini compare to GPT-4?

Self-supervised vision

Self-supervised vision

Vision Transformer Basics

Vision Transformer Basics

Is Chain of

Is Chain of Thought faithful?

How strong is

How strong is Claude 2?

What does AI

What does AI believe is true?

Can we verify

Can we verify training data?

What is Superalignment?

What is Superalignment?

What is SDXL

What is SDXL 0.9?

Eliciting Latent Knowledge

Eliciting Latent Knowledge

What is KOSMOS-2?

What is KOSMOS-2?

Possible catastrophic AI

Possible catastrophic AI risks?

Textbooks Are All

Textbooks Are All You Need

What is Gaussian

What is Gaussian Elimination?