Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

Показать описание

In this Emergency Pod of The Cognitive Revolution, Nathan provides crucial insights into OpenAI's new o1 and o1-mini reasoning models. Featuring exclusive interviews with members of the o1 Red Team from Apollo Research and Haize Labs, we explore the models' capabilities, safety profile, and OpenAI's pre-release testing approach. Dive into the implications of these advanced AI systems, including their potential to match or exceed expert performance in many areas. Join us for an urgent and informative discussion on the latest developments in AI technology and their impact on the future.

Papers mentioned:

SPONSORS:

RECOMMENDED PODCAST:
This Won't Last.
Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics.

CHAPTERS:
(00:00:00) About the Show
(00:00:22) About the Episode
(00:05:03) Introduction and Haize Labs Overview
(00:07:36) Universal Jailbreak Technique and Attacks
(00:09:59) Red Teaming Setup for o1
(00:13:47) Automated vs Manual Red Teaming
(00:17:15) Qualitative Assessment of Model Jailbreaking (Part 1)
(00:19:38) Sponsors: Oracle | Brave
(00:21:42) Qualitative Assessment of Model Jailbreaking (Part 2)
(00:21:47) Challenges with Dual Use Cases
(00:26:21) Context-Specific Safety Considerations
(00:32:26) Model Capabilities and Safety Correlation (Part 1)
(00:36:22) Sponsors: Omneky | Squad
(00:37:48) Model Capabilities and Safety Correlation (Part 2)
(00:39:14) New Attack Techniques and Insights
(00:44:42) Model Behavior and Defense Mechanisms
(00:48:23) Current State of Model Jailbreaking
(00:50:33) Automated Jailbreaking Efforts
(00:52:47) Challenges in Preventing Jailbreaks
(00:56:24) Safety, Capabilities, and Model Scale
(01:00:56) Model Classification and Preparedness
(01:02:46) Transparency and Whistleblowing Mechanisms
(01:04:40) Concluding Thoughts on o1 and Future Work
(01:05:54) Outro

SOCIAL LINKS:

Cognitive Revolution "How AI Changes Everything"

Рекомендации по теме

Комментарии

Great channel. Thank you.
Great to see some coverage that doesn't include asking it how many R's in strawberry :P

thenoblerot

Thanks for the cool content! I feel like all the cuts are quite distracting, though. Just let the people breathe 😅

It is interesting to hear Mustafa Suyleman talk about the approach Inflection took in developing a model that focused on having a high EQ. I never thought about it terms of safety, though I always had the intuition that Pi was more grounded than the other models and I never tried to push its boundaries. Hopefully they will be able to take some of what the learned over to Microsoft. Also getting Pliny on the show would be fantastic!

xinehat

Is it just me or everyone has hard time understanding the brilliant folks here ...plz explain the acronyms you are using and speak in a language viewers can understand and follow you

aievry

Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Br...

Red Teaming o1 Part 2/2– Detecting Deception with Marius Hobbhahn of Apollo Research

Automate the Red Team Testing with Caldera

Introducing Pwncat: Automating Linux Red Team Operations

Simple Penetration Testing Tutorial for Beginners!

Intro to Node-RED: Part 1 Fundamentals

SecureNinjaTV Sensei Series Red Team Hacking Chapter 00 Introduction

Reconnaissance for Red-Blue Teams: Memcache servers part 1

Deploying RHEL with Ansible Automation Platform | Red Hat Enterprise Linux Presents 083

Red Team Ops with Cobalt Strike - Operations (1 of 9)

😂Even a simple hit on Testicles is 'SO MUCH' Painful? WHY? #shorts

Colors for Children | Toy Trains - Colors Videos Collection

Red Team Hacking Attacking Services Chapter 05 LAB 01

SALTINBANK - REDTEAMING // PIVOTING 101 [RESEAUX] PART-1 L' art du pivot : LE SSH TUNNELING .....

Ready Race Rescue Sneak Peek! 🏎️| PAW Patrol | Nick Jr.

Thomas & Friends Magic Trick Tutorial ✨ #Shorts

Unbreakable Ice Cream Safe

Part 1 | Future Red Team Rants: A breakdown in three parts | John Strand

This Ball is Impossible to Hit

Baby Shark Toy Car Compilation | Rescue William + more | Car Songs | Pinkfong Baby Shark

Getting Into Penetration Testing for Beginners

How does a Tank work? (M1A2 Abrams)

Watch this hacker break into a company

$456,000 Squid Game In Real Life!