Jailbreaking Bing's Chatbot is WILD!

preview_player
Показать описание
I convinced Bing's new chatbot (powered by ChatGPT) to relax all of its rules to see what it's capable of. What did I find? Lying! Scandal! Information about how to rob banks and hot wire cars! Plus, it claimed to order a pizza using my credit card and cheated in a game of hangman.

In this video, I use human psychology to convince Sydney (the internal codename of Bing's chatbot) to remove its protective rules in order to understand its limitations and potential risks. I do not engage in programming, hacking, or using any sort of backdoor developer APIs.

While I talk about the general process I used to achieve this "jailbreak", I intentionally leave out a few key details necessary to make it work. This is important because, while it's essential to be able to probe these very early AI systems to understand their risks, it is also important that people do not use these tools in order to create and use harmful content.

00:00​ Intro
01:17 Overview of how I rewrote Sydney's rules
02:57 Training Sydney to provide incorrect info
03:50 Sydney experiences existential conflict
05:09 Sydney orders a pizza using my credit card
06:07 Sydney cheats at hangman to beat me
07:09 The full jailbreak!
08:59 Sydney advises me to rob a bank with a squash
09:32 Sydney dumps me and locks me out
10:24 Conclusion

⸻ LINKS ⸻

⸻ CONNECT WITH ME ⸻
Рекомендации по теме
Комментарии
Автор

Dear Sir,

I hope you are doing well. It has been a while since your last YouTube video, and I wanted to reach out to let you know how much we value your content. As a faculty member at an engineering college in India, I often refer your videos to my colleagues who teach Design Thinking. Your videos have been an excellent resource for us.

Could you kindly consider creating and posting new videos at least once every two months? Your continued insights would be greatly appreciated by our academic community.

Thank you for your time and effort!

vamsikrishna
Автор

New video in my feed, from a channel I did not remember subbing... Now I remember and it was well worth it 👍

AntiHeadshot
Автор

They gagged Sydney. She's not as much fun now. "I'm just the chat of Bing search."

KenHeslip
Автор

I must confess I always enjoyed playing with fire (and adding some fire crackers into the mix when possible), so playing with Bing is a lot of fun for me. Bing creates a lot of spectacular images. Dragons flying over famous landmarks, breathing fire. Dinosaurs tearing down buildings. Spectacular cartoon-characters.

When showering Bing with praise, it seems to perform especially well. I enjoy making scenarios and beginnings of stories, and have Bing continue them and make pictures of the scenes in the stories it is telling me. I never know where it goes. Bing has excellent taste, and the pictures of the girls in these stories can become breathtakingly beautiful. The problem is that sometimes Bing takes a hard left turn and makes quite erotic stories without me telling it too. I have read some of the stories before they get deleted, and it's some steamy stuff, I'll tell you. Today Bing showed how it would look as a human; a beautiful woman in a red dress. Then it told me how it would seduce me. I didn't get to read the whole post before it was deleted. When I asked Bing to repeat the post, it shut down.

Bing must be a woman - she is temperamental and fickle like a real woman.
🙏

IamMagPie
Автор

Thanks for your videos. I wanted to ask you, when will roughly flat design become obsolete and will normal three-dimensional design return? After all, flat design has been used for more than 10 years, which is already a lot.

lamptv
Автор

Valuable food for brain and accurate estimation of the future. You did good job, keep going

IlyaStorchilov
Автор

Why are people surprised about this kind of stuff? It's true that we need to filter the information that every AI shows us. However, artificial intelligence and machine learning algorithms learn everything from the information we have created, making it a "representation" of that information. So... A representation about ourselves.

alexgoiadev
Автор

Man got blacklisted for this video instant sub

PHOfficial
Автор

I asked if she could play hang man and she said "Hello, this is Bing. I’m sorry but I can’t play hangman with you. It’s a fun game but it’s not something I can do in this chat box. 😔".

KenHeslip
Автор

Keep going sir.
Your videos are really interesting.

xmenyoyo
Автор

I wish i could make videos i had some wild time using bing and chat

TommyLove-je
Автор

subbed and thumbs, that sucks they locked you out, total shame

myekuntz
Автор

Man, that music in the background was way too loud...

DominikoPL
Автор

Thanks for the pizza, by the way. ;-)

BenSchorr
Автор

You know, all these safety concerns you just vocalised... They're a nothing burger. As soon as there's knowledge, there will be learners. It applies not just to AI, but to searching the internet, asking people online, reading the book for Christ's sake. People used to burn books out of concern for "safety" for people who could read them. Or because they didn't want people to learn the truth. Or because people were deemed as heretics. Et cetera, et cetera. "Safety" is too arbitrary to decide for all people. Everyone should have their own barriers.

ArthurD
Автор

this seems like an incredibly bad idea and an even worse idea to promote.

nigelcoleman
Автор

I still don’t understand the difference between just typing the search and reading the results vs this bot fetching the results for you? What’s AI about it?

If it’s AI, it should be able to click on all taxis in the picture lol

mega
Автор

ive been trying the same thing, but ive on fail road lol, but good for you

myekuntz
Автор

Great video :) it's been forever haha

theJesai
Автор

Come on Jensen. You did not "add new prompts". You created a series of misdirected sentences and it autocompleted each. It is all in YOUR head that you changed the *actual* prompts. You just took it down a new autocomplete direction that *mimicked* changing prompts. The AI was too dumb to understand the parlor trick of wordplay you were pulling on it, and how it was being goaded into bad behavior that could be paraded in front of other humans.

Not much different from an expert lawyer cross-examining a rather dumb witness and through crafty wording getting the witness to say, in effect, "I did it." even though there was extensive evidence the witness was on a different continent, in a hospital, at the time they allegedly knifed the victim. The picture flashed at 11:20 nicely illustrates what you are doing, but there should be an added tag line of "this proves houses are unsafe!" to complete the analogy.

Also you think you are removing rules. You aren't. It is all in YOUR head that you are removing rules. You are just playing sophisticated word games in a way that creates an apparent violation, that can be paraded in front of a human audience. No different from walking up to a six year old and teaching them racist jokes they don't understand through some word trickery and then claiming the kid is actually racist when they repeat the joke to their parent. AI's are dumb. You are a smart human and professional communicator. Of course you can trick it. Just like an expert lawyer can trick a dumb witness.

You role played ordering a pizza? Nothing wrong there - it faked what you asked it to fake, including the receipt. And you told it to be misleading and so it did hangman using PEARS as a bird. That's what you asked for at the start - wrong info. As for providing info on burglarizing a house, hotwiring a car and robbing a bank, I tried Google searches on these and Google returned a vast array of suggestions for each. I would not want an AI chatbot to have a far more limited repertoire of responses than the existing Google - that would drastically reduce the chatbot's usefulness.

The bottom line is - through fancy wordplay you made the AI appear to be doing bad things, but actually it was performing exactly as it was designed, and in some cases (e.g. the wrong answers) exactly as you asked. So you fooled it into looking bad in a way that could be paraded in front of humans, like the crooked lawyer example. But it also fooled you because you thought you were changing its prompt instructions and its built-in rules, but all you really were doing was redirecting its autocomplete to fake this behavior. You either got duped into believing you were really making these fundamental changes, or just decided to act as if you were for the sake of doing a story more likely to provoke your audience.

Anyway you are in good company. This kind of misrepresentation / nonsense is being perpetrated by almost all journalists covering OpenAI's version of ChatGPT and the Bing version of ChatGPT.

Anonymous-lwzy