Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

preview_player
Показать описание
In this Emergency Pod of The Cognitive Revolution, Nathan provides crucial insights into OpenAI's new o1 and o1-mini reasoning models. Featuring exclusive interviews with members of the o1 Red Team from Apollo Research and Haize Labs, we explore the models' capabilities, safety profile, and OpenAI's pre-release testing approach. Dive into the implications of these advanced AI systems, including their potential to match or exceed expert performance in many areas. Join us for an urgent and informative discussion on the latest developments in AI technology and their impact on the future.

Papers mentioned:

SPONSORS:

RECOMMENDED PODCAST:
This Won't Last.
Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics.

CHAPTERS:
(00:00:00) About the Show
(00:00:22) About the Episode
(00:05:03) Introduction and Haize Labs Overview
(00:07:36) Universal Jailbreak Technique and Attacks
(00:09:59) Red Teaming Setup for o1
(00:13:47) Automated vs Manual Red Teaming
(00:17:15) Qualitative Assessment of Model Jailbreaking (Part 1)
(00:19:38) Sponsors: Oracle | Brave
(00:21:42) Qualitative Assessment of Model Jailbreaking (Part 2)
(00:21:47) Challenges with Dual Use Cases
(00:26:21) Context-Specific Safety Considerations
(00:32:26) Model Capabilities and Safety Correlation (Part 1)
(00:36:22) Sponsors: Omneky | Squad
(00:37:48) Model Capabilities and Safety Correlation (Part 2)
(00:39:14) New Attack Techniques and Insights
(00:44:42) Model Behavior and Defense Mechanisms
(00:48:23) Current State of Model Jailbreaking
(00:50:33) Automated Jailbreaking Efforts
(00:52:47) Challenges in Preventing Jailbreaks
(00:56:24) Safety, Capabilities, and Model Scale
(01:00:56) Model Classification and Preparedness
(01:02:46) Transparency and Whistleblowing Mechanisms
(01:04:40) Concluding Thoughts on o1 and Future Work
(01:05:54) Outro

SOCIAL LINKS:
Рекомендации по теме
Комментарии
Автор

Great channel. Thank you.
Great to see some coverage that doesn't include asking it how many R's in strawberry :P

thenoblerot
Автор

Thanks for the cool content! I feel like all the cuts are quite distracting, though. Just let the people breathe 😅

Автор

It is interesting to hear Mustafa Suyleman talk about the approach Inflection took in developing a model that focused on having a high EQ. I never thought about it terms of safety, though I always had the intuition that Pi was more grounded than the other models and I never tried to push its boundaries. Hopefully they will be able to take some of what the learned over to Microsoft. Also getting Pliny on the show would be fantastic!

xinehat
Автор

Is it just me or everyone has hard time understanding the brilliant folks here ...plz explain the acronyms you are using and speak in a language viewers can understand and follow you

aievry