filmov
tv
Anthropic's New Mech-Interp Paper, A Deep Dive
Показать описание
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
Discuss this stuff with other Tunadorks on Discord
All my other links
Anthropic's New Mech-Interp Paper, A Deep Dive
Anthropic Solved Interpretability?
Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podca...
Anthropic Unlocks the Mystery of LLMs
Scaling interpretability
Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability
What is mechanistic interpretability? Neel Nanda explains.
Quantum entanglement explained by Neil deGrasse Tyson with Joe Rogan #shorts
Mechanistic Interpretability explained | Chris Olah and Lex Fridman
SLT Summit 2023 - Toy Models of Superposition (Mech Interp 1)
Open Problems in Mechanistic Interpretability: A Whirlwind Tour
SLT Summit 2023 - Induction Heads and Phase Transitions (Mech Interp 2)
Hella New AI Papers - Aug 9, 2024
A Walkthrough of Toy Models of Superposition w/ Jess Smith
9. Wojciech Lesicki and Andrzej Agria: Attacking and Defending LLMs in Production Environments
Concrete open problems in mechanistic interpretability | Neel Nanda | EAG London 23
Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability
Open Problems in Mechanistic Interpretability: A Whirlwind Tour | Neel Nanda | EAGxVirtual 2023
Neel Nanda: Mechanistic Interpretability & Mathematics
How might LLMs store facts | DL7
INTERVIEW: Applications w/ Alice Rigg
Why US AI Act Compute Thresholds Are Misguided...
Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety
Mechanistic Interpretability for AI Alignment | Callum McDougall, Joseph Bloom | EAGxBerlin 2023
Комментарии