LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards

preview_player
Показать описание

VIDEO RESOURCES:

OTHER TRELIS LINKS:

TIMESTAMPS:
0:00 LLM Security Risks
0:55 Video Overview
6:16 Resources and Scripts
8:11 Installation and Server Setup
12:37 Jailbreak attacks to avoid Safety Guardrails
21:05 Detecting jailbreak attacks
22:24 Llama Guard and its prompt template
27:11 Llama Prompt Guard
28:40 Testing Jailbreak Detection
35:58 Testing for false positives with Llama Guard
40:00 Off-topic Requests
50:34 Prompt Injection Attacks (Container escape, File access / deletion, DoS)
1.05:27 Detecting Injection Attacks with a Custom Guard
1:10:00 Preventing Injection Attacks via User Authentication
1:1037 Using Prepared Statements to avoid SQL Injection Attacks
1:11:47 Response Sanitisation to avoid Injection Attacks
1:12:58 Malicious Code Attacks
1:14:07 Building a custom classifier for malicious code
1:15:57 Using Codeshield to detect malicious code
1:16:53 Malicious Code Detection Performance
1:20:40 Effect of Guards/shields on Response Time / Latency
1:25:12 Final Tips
1:26:59 Resources
Рекомендации по теме
Комментарии
Автор

For those of you with lifetime access to ADVANCED-inference, these scripts are in the "security" folder on the main branch. You'll need to run git fetch if you've previously cloned the repo.

TrelisResearch
Автор

Ok, I believe your channel is absolute golden now. Will recommend to anyone learning AI.

MaximeDde
Автор

Hey Ronan, Thanks for the video. This topic you covered is the best. I haven't seen this topic covered on youtube.

divyagarh
Автор

Thanks Trellis. That's a very useful video. Few if any touch on the topic of LLM/chatbot security. I am surprised you did not talk about API hacking and how to prevent it. It would be great to have your insight on this topic. 👍

myworld
Автор

Trelis can't stop, won't stop

ChrisSMurphy
Автор

can you suggest some of the best ranking rag algorithms like I am working on project to build ai freelancer matcher in which I will rank the gigs based on the description, ratings, number of review. How can I approach this.

AnuragMishra-wszc