filmov
tv
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
Показать описание
Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity.
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Michael
Outline:
00:00 DEJAVU explained
02:58 Sparse neural networks
04:06 Why static sparsity hurts
04:43 Contextual sparsity
05:40 DEJAVU method
07:59 Speedups!
08:52 MoE: Connection to Mixture of Experts
09:38 Theoretical insights: Why can we make MLPs sparsity
10:36 Why can we make attention sparse?
11:38 Attention does Mean-shift clustering!
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Video editing: Nils Trost
Music 🎵 : Sunday Rain - Cheel
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Michael
Outline:
00:00 DEJAVU explained
02:58 Sparse neural networks
04:06 Why static sparsity hurts
04:43 Contextual sparsity
05:40 DEJAVU method
07:59 Speedups!
08:52 MoE: Connection to Mixture of Experts
09:38 Theoretical insights: Why can we make MLPs sparsity
10:36 Why can we make attention sparse?
11:38 Attention does Mean-shift clustering!
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Video editing: Nils Trost
Music 🎵 : Sunday Rain - Cheel
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
Unlock Faster and More Efficient LLMs with SparseGPT
1 Million Tiny Experts in an AI? Fine-Grained MoE Explained
Sparsity for Efficient Long Sequence Generation of LLMs
Yuandong Tian | Efficient Inference of LLMs with Long Context Support
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
Mixtral On Your Computer | Mixture-of-Experts LLM | Free GPT-4 Alternative | Tutorial
Mistral MoE - Better than ChatGPT?
Mixtral 8x7B is AMAZING: Know how it's Beating GPT-3.5 & Llama 2 70B!
Unlocking Faster AI: Medusa's Multi Head Decoding for LLMs
Build your own AI-driven code autocompletion tool with these 8 cool open-source projects.
EfficientML.ai Lecture 19: On-Device Training and Transfer Learning (MIT 6.5940, Fall 2023)
Global Vision Transformer Pruning with Hessian-Aware Saliency | CVPR 2023
Mistral 8x7B Part 2- Mixtral Updates
LLAMA 3 : Explained and Summarised Under 8 Minutes (Compared to Llama 2, Meta AI)
Hella New AI Papers - Aug 9, 2024
Unlocking Mixture of Experts : From 1 Know-it-all to group of Jedi Masters — Pranjal Biyani
MIXTRAL 8x7B MoE Instruct: LIVE Performance Test
Optimizing (NLP) Transformer Models for Performance
CASS Talks 2024 - Antonio Carlos Schneider Beck Filho - UFRGS, Brazil - September 27, 2024
Finetuning and Inferencing ( Abhishek Jindal)
Run Mixtral 8x7B Hands On Google Colab for FREE | End to End GenAI Hands-on Project
Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & De...
Large models on CPUs
Комментарии