Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

preview_player
Показать описание
Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity.

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Michael

Outline:
00:00 DEJAVU explained
02:58 Sparse neural networks
04:06 Why static sparsity hurts
04:43 Contextual sparsity
05:40 DEJAVU method
07:59 Speedups!
08:52 MoE: Connection to Mixture of Experts
09:38 Theoretical insights: Why can we make MLPs sparsity
10:36 Why can we make attention sparse?
11:38 Attention does Mean-shift clustering!

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

Video editing: Nils Trost
Music 🎵 : Sunday Rain - Cheel
Рекомендации по теме
Комментарии
Автор

I don't know why the algorythm has neglected showing me your content so long. It is right up my alley. I hope you keep up coming with the latest news from AI. It is the biggest thing to happen to humanity since ever, and people still dont react to it. I like your vids. You are smart and the animations are fun. And god, the lipstick. Holy shit. 🙃😍

Ben_D.
Автор

What a crazy video. I learned so much, thank you for making this!

poketopa
Автор

This channel is a real gem! Can we continue to expect 2 videos a month?

Aca
Автор

I never thought that what I need in life was Ms. Coffee Bean telling me to "sit down". Now I know

DerPylz
Автор

This was a fantastic and concise explanation!! I'll read the paper in more detail; however, is this method also effective when combined with quantization? I want to run large models in reasonably priced hardware just for inference.

Автор

Isn't GELU already enforcing "input-dependent" sparsity ?

lelouch
Автор

Nice animations.
It feels like we are going in circles: this paper (and ReLU Strikes Back) reintroduce ReLU; S6, RWKV, RetNet reintroduce RNN. Flip a coin what past happens next - residual-free models or AI Winter.

AM-ykyd
Автор

does this mean we can run larger models on smaller gpus?

ew
Автор

IT majors in India used to recruit for general intelligence and then make it sparse in the profession, focusing on specialized, repetitive tasks rather than broad skill development.

ramkumarr
Автор

Yes, when a person refers to the human brain in comparison to AI, they generally mean the collective intelligence of humanity, rather than the capabilities of an individual brain.

ramkumarr
Автор

GPT makers could have a share price hit

Artifactorfiction