LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards

Показать описание

VIDEO RESOURCES:

OTHER TRELIS LINKS:

TIMESTAMPS:
0:00 LLM Security Risks
0:55 Video Overview
6:16 Resources and Scripts
8:11 Installation and Server Setup
12:37 Jailbreak attacks to avoid Safety Guardrails
21:05 Detecting jailbreak attacks
22:24 Llama Guard and its prompt template
27:11 Llama Prompt Guard
28:40 Testing Jailbreak Detection
35:58 Testing for false positives with Llama Guard
40:00 Off-topic Requests
50:34 Prompt Injection Attacks (Container escape, File access / deletion, DoS)
1.05:27 Detecting Injection Attacks with a Custom Guard
1:10:00 Preventing Injection Attacks via User Authentication
1:1037 Using Prepared Statements to avoid SQL Injection Attacks
1:11:47 Response Sanitisation to avoid Injection Attacks
1:12:58 Malicious Code Attacks
1:14:07 Building a custom classifier for malicious code
1:15:57 Using Codeshield to detect malicious code
1:16:53 Malicious Code Detection Performance
1:20:40 Effect of Guards/shields on Response Time / Latency
1:25:12 Final Tips
1:26:59 Resources

Рекомендации по теме

Комментарии

For those of you with lifetime access to ADVANCED-inference, these scripts are in the "security" folder on the main branch. You'll need to run git fetch if you've previously cloned the repo.

TrelisResearch

Ok, I believe your channel is absolute golden now. Will recommend to anyone learning AI.

MaximeDde

Hey Ronan, Thanks for the video. This topic you covered is the best. I haven't seen this topic covered on youtube.

divyagarh

Thanks Trellis. That's a very useful video. Few if any touch on the topic of LLM/chatbot security. I am surprised you did not talk about API hacking and how to prevent it. It would be great to have your insight on this topic. 👍

myworld

Trelis can't stop, won't stop

ChrisSMurphy

can you suggest some of the best ranking rag algorithms like I am working on project to build ai freelancer matcher in which I will rank the gigs based on the description, ratings, number of review. How can I approach this.

AnuragMishra-wszc

LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards

LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards

Prompt Injection 101 - Understanding Security Risks in LLM | Payatu Webinar

Jailbreaking LLMs - Prompt Injection and LLM Security

Prompt Injection & LLM Security

LLM Safety and LLM Prompt Injection

Prompt Injection Attack

How to HACK ChatGPT

A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool LLMs Easily

[1hr Talk] Intro to Large Language Models

How Large Language Models Work

Real-world exploits and mitigations in LLM applications (37c3)

How I HACKED GPT in Minutes! Prompt Injection SECRETS Revealed

Explained: The OWASP Top 10 for Large Language Model Applications

Indirect Prompt Injection | How Hackers Hijack AI

Hacking Knowledge

The Secret Methods To Jailbreak ChatGPT

Doublespeak: Jailbreaking ChatGPT-style Sandboxes using Linguistic Hacks

Richie Lee - LLM Security 101 - An Introduction to AI Red Teaming | PyData Amsterdam 2024

Mastering the Basics of Prompt Injection 💉🤖 (GPT-3/GPT-4/LLM)

What is Prompt Injection? Can you Hack a Prompt?

How to **BYPASS** the CHATGPT FILTER

Prompt Injections in the Wild - Exploiting Vulnerabilities in LLM Agents | HITCON CMT 2023

New Jailbreak Method PUNISHES GPT4, Claude, Gemini, LLaMA

How to Jailbreak ChatGPT (GPT4) & Use it for Hacking

How to BYPASS the CHATGPT FILTER