Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Показать описание

Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity.

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Michael

Outline:
00:00 DEJAVU explained
02:58 Sparse neural networks
04:06 Why static sparsity hurts
04:43 Contextual sparsity
05:40 DEJAVU method
07:59 Speedups!
08:52 MoE: Connection to Mixture of Experts
09:38 Theoretical insights: Why can we make MLPs sparsity
10:36 Why can we make attention sparse?
11:38 Attention does Mean-shift clustering!

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Video editing: Nils Trost
Music 🎵 : Sunday Rain - Cheel

Рекомендации по теме

Комментарии

I don't know why the algorythm has neglected showing me your content so long. It is right up my alley. I hope you keep up coming with the latest news from AI. It is the biggest thing to happen to humanity since ever, and people still dont react to it. I like your vids. You are smart and the animations are fun. And god, the lipstick. Holy shit. 🙃😍

Ben_D.

What a crazy video. I learned so much, thank you for making this!

poketopa

This channel is a real gem! Can we continue to expect 2 videos a month?

Aca

I never thought that what I need in life was Ms. Coffee Bean telling me to "sit down". Now I know

DerPylz

This was a fantastic and concise explanation!! I'll read the paper in more detail; however, is this method also effective when combined with quantization? I want to run large models in reasonably priced hardware just for inference.

Isn't GELU already enforcing "input-dependent" sparsity ?

lelouch

Nice animations.
It feels like we are going in circles: this paper (and ReLU Strikes Back) reintroduce ReLU; S6, RWKV, RetNet reintroduce RNN. Flip a coin what past happens next - residual-free models or AI Winter.

AM-ykyd

does this mean we can run larger models on smaller gpus?

ew

IT majors in India used to recruit for general intelligence and then make it sparse in the profession, focusing on specialized, repetitive tasks rather than broad skill development.

ramkumarr

Yes, when a person refers to the human brain in comparison to AI, they generally mean the collective intelligence of humanity, rather than the capabilities of an individual brain.

ramkumarr

GPT makers could have a share price hit

Artifactorfiction

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Unlock Faster and More Efficient LLMs with SparseGPT

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

Sparsity for Efficient Long Sequence Generation of LLMs

Yuandong Tian | Efficient Inference of LLMs with Long Context Support

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Mixtral On Your Computer | Mixture-of-Experts LLM | Free GPT-4 Alternative | Tutorial

Mistral MoE - Better than ChatGPT?

Mixtral 8x7B is AMAZING: Know how it's Beating GPT-3.5 & Llama 2 70B!

Unlocking Faster AI: Medusa's Multi Head Decoding for LLMs

Build your own AI-driven code autocompletion tool with these 8 cool open-source projects.

EfficientML.ai Lecture 19: On-Device Training and Transfer Learning (MIT 6.5940, Fall 2023)

Global Vision Transformer Pruning with Hessian-Aware Saliency | CVPR 2023

Mistral 8x7B Part 2- Mixtral Updates

LLAMA 3 : Explained and Summarised Under 8 Minutes (Compared to Llama 2, Meta AI)

Hella New AI Papers - Aug 9, 2024

Unlocking Mixture of Experts : From 1 Know-it-all to group of Jedi Masters — Pranjal Biyani

MIXTRAL 8x7B MoE Instruct: LIVE Performance Test

Optimizing (NLP) Transformer Models for Performance

CASS Talks 2024 - Antonio Carlos Schneider Beck Filho - UFRGS, Brazil - September 27, 2024

Finetuning and Inferencing ( Abhishek Jindal)

Run Mixtral 8x7B Hands On Google Colab for FREE | End to End GenAI Hands-on Project

Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & De...

Large models on CPUs