Stealing Part of a Production Language Model | AI Paper Explained

preview_player
Показать описание
Many of the top LLMs today are closed source. What if we could discover their internal weights?
In this video we dive into a recent research paper from Google DeepMind which presents an attack on large language models. The attack targets transformer-based LLMs, that expose log probabilities as part of their API, which includes GPT-4 and PaLM-2. The researchers successfully used the attack to discover internal data about OpenAI models. Part of the extracted data includes the hidden dimension size of gpt-3.5-turbo, and the researchers estimate it would take less than 2,000$ to extract the weights of the embedding projection layer of that model.

-----------------------------------------------------------------------------------------------

👍 Please like & subscribe if you enjoy this content

-----------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction
1:13 Attack Targets
2:36 Hidden Dimension Extraction
5:29 Weights Extraction
6:18 Recover Logits From Log Probabilities
8:10 Results
Рекомендации по теме
Комментарии
Автор

Good content, but i think it would be better if you went a little more in depth into all the smaller details, for me it felt like it was going a bit too fast and didnt understand a bit of what was happening, so i would prefer if there were more in depth explanation of the technical details.

imaginebaggins