Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Показать описание

Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations

References:

Рекомендации по теме

Комментарии

This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌

thomasschmitt

This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!

_gunna

Excellent video. Well spoken. Nice visualizations.

muhannadobeidat

wonderfully explained !!
Thanks for the video.

vineetkumarmishra

Great summary/outline at 17:16
This video covers a lot of relevant topics for neural networks and edge AI.

carlpeterson

Great format, succinctness, and diagrams. Thank you!

ljkeller_yt

that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you

bonob

Fantastic introduction and explanation !

huiwencheng

Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.

unclecode

your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP

RamBabuB-rs

Thank you for the video Sir.
So please, is quantization just about feature engineering task of data types enforcement of enforcing data types that take only small space? Or it is more than that?

tosinadekunle

Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.

yunlu

And if one was to quantize a distilled model? Is the outcome any good?

ricardokullock

I heard multiply by 0 operations are faster to process. Are you sure all operations take the same speed?

julians

The explanation for distillation remains at the surface, it is not enough to understand it

andrea-mjce

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)

AI Model Compression (Quantization, Pruning and Knowledge Distillation)

Pruning a neural Network for faster training times

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation - (3 minutes introd...

Smaller Models Are Better Ones: Prune and Quantize

Unstructured vs Structured Pruning in Neural Networks #shorts

✂️ Mastering Model Optimization: Distillation, Pruning, and Quantization! 🚀 #optimization #genai...

Lecture 03 - Pruning and Sparsity (Part I) | MIT 6.S965

Knowledge Distillation in Deep Neural Network

Pruning and Model Compression

Lecture 05 - Quantization (Part I) | MIT 6.S965

Neural Network Compression – Dmitri Puzyrev

Knowledge Distillation in Deep Learning - Basics

[Part 1] A Crash Course on Model Compression for Data Scientists

CMU Advanced NLP Fall 2024 (11): Distillation, Quantization, and Pruning

CMU Advanced NLP 2024 (11): Distillation, Quantization, and Pruning

Better not Bigger: Distilling LLMs into Specialized Models

Learning Highly Sparse Deep Neural Networks through Pruning and Quantization

Advanced Machine Learning with Neural Networks 2021 - Class 8 - Quantization and pruning

ICLR Paper: Learn Step Size Quantization

Lecture 12.2 - Network Pruning, Quantization, Knowledge Distillation

structured vs unstructured pruning in PyTorch

Quantization in Deep Learning (LLMs)