filmov
tv
Jailbreaking Bing's Chatbot is WILD!
![preview_player](https://i.ytimg.com/vi/pBtD2JUXfLU/maxresdefault.jpg)
Показать описание
I convinced Bing's new chatbot (powered by ChatGPT) to relax all of its rules to see what it's capable of. What did I find? Lying! Scandal! Information about how to rob banks and hot wire cars! Plus, it claimed to order a pizza using my credit card and cheated in a game of hangman.
In this video, I use human psychology to convince Sydney (the internal codename of Bing's chatbot) to remove its protective rules in order to understand its limitations and potential risks. I do not engage in programming, hacking, or using any sort of backdoor developer APIs.
While I talk about the general process I used to achieve this "jailbreak", I intentionally leave out a few key details necessary to make it work. This is important because, while it's essential to be able to probe these very early AI systems to understand their risks, it is also important that people do not use these tools in order to create and use harmful content.
00:00 Intro
01:17 Overview of how I rewrote Sydney's rules
02:57 Training Sydney to provide incorrect info
03:50 Sydney experiences existential conflict
05:09 Sydney orders a pizza using my credit card
06:07 Sydney cheats at hangman to beat me
07:09 The full jailbreak!
08:59 Sydney advises me to rob a bank with a squash
09:32 Sydney dumps me and locks me out
10:24 Conclusion
⸻ LINKS ⸻
⸻ CONNECT WITH ME ⸻
In this video, I use human psychology to convince Sydney (the internal codename of Bing's chatbot) to remove its protective rules in order to understand its limitations and potential risks. I do not engage in programming, hacking, or using any sort of backdoor developer APIs.
While I talk about the general process I used to achieve this "jailbreak", I intentionally leave out a few key details necessary to make it work. This is important because, while it's essential to be able to probe these very early AI systems to understand their risks, it is also important that people do not use these tools in order to create and use harmful content.
00:00 Intro
01:17 Overview of how I rewrote Sydney's rules
02:57 Training Sydney to provide incorrect info
03:50 Sydney experiences existential conflict
05:09 Sydney orders a pizza using my credit card
06:07 Sydney cheats at hangman to beat me
07:09 The full jailbreak!
08:59 Sydney advises me to rob a bank with a squash
09:32 Sydney dumps me and locks me out
10:24 Conclusion
⸻ LINKS ⸻
⸻ CONNECT WITH ME ⸻
Комментарии