filmov
tv
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Показать описание
Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations
References:
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)
AI Model Compression (Quantization, Pruning and Knowledge Distillation)
Pruning a neural Network for faster training times
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation - (3 minutes introd...
Smaller Models Are Better Ones: Prune and Quantize
Unstructured vs Structured Pruning in Neural Networks #shorts
✂️ Mastering Model Optimization: Distillation, Pruning, and Quantization! 🚀 #optimization #genai...
Lecture 03 - Pruning and Sparsity (Part I) | MIT 6.S965
Knowledge Distillation in Deep Neural Network
Pruning and Model Compression
Lecture 05 - Quantization (Part I) | MIT 6.S965
Neural Network Compression – Dmitri Puzyrev
Knowledge Distillation in Deep Learning - Basics
[Part 1] A Crash Course on Model Compression for Data Scientists
CMU Advanced NLP Fall 2024 (11): Distillation, Quantization, and Pruning
CMU Advanced NLP 2024 (11): Distillation, Quantization, and Pruning
Better not Bigger: Distilling LLMs into Specialized Models
Learning Highly Sparse Deep Neural Networks through Pruning and Quantization
Advanced Machine Learning with Neural Networks 2021 - Class 8 - Quantization and pruning
ICLR Paper: Learn Step Size Quantization
Lecture 12.2 - Network Pruning, Quantization, Knowledge Distillation
structured vs unstructured pruning in PyTorch
Quantization in Deep Learning (LLMs)
Комментарии