Demystifying LLMs with Mechanistic Interpretability Researcher Arthur Conmy

Показать описание

TIMESTAMPS:
(00:00) Episode Preview
(04:40) What attracted Arthur to mechanistic interpretability?
(07:49) LLM information processing: General Understanding vs Stochastic Parrot Paradigm
(14:45) Sponsors: NetSuite | Omneky
(24:30) Putting together data sets
(32:39) How to intervene in LLMs network activity
(36:00) Setting metrics to evaluate the production of correct completions
(44:20) The future of the mechanistic interpretability research
(50:00) Extracting upstream activations in the ACDC project and evaluating impact on downstream components.
(56:00) Anthropic research findings
(01:08:00) 3-Step process of the ACDC approach
(01:22:00) Setting a threshold and validation
(01:27:00) Goal of the approach
(01:32:00) Compute requirements
*Correction at 1:33:00 Arthur meant to say = "quadratic in nodes"
(01:35:30) Scaling laws for mechanistic interpretability
(01:40:00) Accessibility of this research for casual enthusiasts
(01:46:00) Emergence discourse
(01:56:00) Path to AI safety

LINKS:

SOCIAL MEDIA:
@labenz (Nathan)
@arthurconmy (Arthur)
@cogrev_podcast

SPONSORS: NetSuite | Omneky

-Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that *actually work* customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Music license:
VTPHPIMGIQFU2HXZ

Cognitive Revolution "How AI Changes Everything"

Рекомендации по теме

Комментарии

TIMESTAMPS:
(00:00) Episode Preview
(04:40) What attracted Arthur to mechanistic interpretability?
(07:49) LLM information processing: General Understanding vs Stochastic Parrot Paradigm
(14:45) Sponsors: NetSuite | Omneky
(24:30) Putting together data sets
(32:39) How to intervene in LLMs network activity
(36:00) Setting metrics to evaluate the production of correct completions
(44:20) The future of the mechanistic interpretability research
(50:00) Extracting upstream activations in the ACDC project and evaluating impact on downstream components.
(56:00) Anthropic research findings
(01:08:00) 3-Step process of the ACDC approach
(01:22:00) Setting a threshold and validation
(01:27:00) Goal of the approach
(01:32:00) Compute requirements
*Correction at 1:33:00 Arthur meant to say = "quadratic in nodes"
(01:35:30) Scaling laws for mechanistic interpretability
(01:40:00) Accessibility of this research for casual enthusiasts
(01:46:00) Emergence discourse
(01:56:00) Path to AI safety

CognitiveRevolutionPodcast

did you just get your A level results?

oiuhwoechwe

My gut reaction is that models that facilitate explanation will always be outperformed by models that don't give a shit about being a black box.

afterthesmash

Demystifying LLMs with Mechanistic Interpretability Researcher Arthur Conmy

Demystifying LLMs with Mechanistic Interpretability Researcher Arthur Conmy

What is mechanistic interpretability? Neel Nanda explains.

Demystifying LLMs and Threats My Journey

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Demystifying AI: How Anthropic Cracks the Code of Large Language Models (LLMs)

Cohere For AI - Community Talks - Catherine Olsson on Mechanistic Interpretability: Getting Started

Provably Safe AGI - MIT Mechanistic Interpretability Conference - May 7, 2023

What the hell is going on inside neural networks? | Chris Olah

The Dark Side of Large Language Models: The Stochastic Parrot Phenomenon

A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex)

Graph-of-Thoughts (GoT) for AI reasoning Agents

Discussing AI sentience and stochastic parrots 🦜

How AI Really Works - Intro to Open Source Large Language Models

David Chalmers: Understanding Understanding Through Conceptual Engineering

Hella New AI Papers - Aug 24, 2024

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 2/3

AI Scouting Report – Part 1 of 3: Fundamentals

The AI Copilot Revolution with Div Garg of MULTI·ON

The Future of Software Development with Tyler Angert of Replit

DataFAIR 2023: GPT & Drug Discovery: Rise of Generative Models

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ﻿🦜

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜