Deep Dive on PyTorch Quantization - Chris Gottbrath

Показать описание

It’s important to make efficient use of both server-side and on-device compute resources when developing machine learning applications. To support more efficient deployment on servers and edge devices, PyTorch added a support for model quantization using the familiar eager mode Python API.

Quantization leverages 8bit integer (int8) instructions to reduce the model size and run the inference faster (reduced latency) and can be the difference between a model achieving quality of service goals or even fitting into the resources available on a mobile device. Even when resources aren’t quite so constrained it may enable you to deploy a larger and more accurate model. Quantization is available in PyTorch starting in version 1.3 and with the release of PyTorch 1.4 we published quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2 in the PyTorch torchvision 0.5 library.

Рекомендации по теме

Комментарии

*My takeaways:*
*0. Outline of this talk **0:51*
*1. Motivation **1:42*
- DNNs are very computationally intensive
- Datacenter power consumption is doubling every year
- Number of edge devices is growing fast, and lots of these devices are resource-constrained
*2. Quantization basics **5:27*
*3. PyTorch quantization **10:54*
*3.1 Workflows **17:21*
*3.2 Post training dynamic quantization **21:31*
- Quantize weights at design time
- Quantize activations (and choose their scaling factor) at runtime
- No extra data are required
- Suitable for LSTMs/transformers, and MLPs with small batch size
- 2x faster computing, 4x less memory
- Easy to do, use a 1-line API
*3.3 Post training static quantization **23:57*
- Quantize both weights and activations at design time
- Extra data are needed for calibration (i.e. find scaling factor)
- Suitable for CNNs
- 1.5-2x faster computing, 4x less memory
- Steps: 1. Modify model 25:55 2. Prepare and calibration 27:45 3. Convert 31:34 4. Deploy 32:59
*3.4 Quantization aware training **34:00*
- Make the weights "more quantizable" through training and fine-tuning
- Steps: 1. Modify model 36:43 2. Prepare and train 37:28
*3.5 Example models **39:26*
*4. New in PyTorch 1.6*
4.1 Graph mode quantization 45:14
4.2 Numeric suite 48:17: tools to aid debugging accuracy drops due to quantization at layer-by-layer level
*5. Framework support, CPU (x86, Arm) backends support **49:46*
*6. Resources to know more **50:52*

leixun

How to test the model after quantization?
I am using post training static quant
How to prepare the input to feed in this model

aayushsingh

what if want to fuse multiple conv and relu.

ankitkumar-kgue

In the accuracy results, how come there is a difference in inference speed up between QAT and PTQ? Is this because of the different models used? because i would expect no differences in speed up if the same model was used

rednas

Awesome talk, thanks!
Too much to ask, but it would be nice if Pytorch had a tool to convert quantized tensors parameters to TensorRT calibration tables

MrGHJK

sorry, can you share the example code? Thank you

bioxkfj

Why not go lower than 8 bit int for quantization? Won't that be much more speedier?

jetjodh

great info but please buy a pop filter.

dsagman

OMG. We already have a term of the art for "zero point." It's called bias. We have a term, please use it. Otherwise, thanks for the great talk.

briancase

Deep Dive on PyTorch Quantization - Chris Gottbrath

Deep Dive on PyTorch Quantization - Chris Gottbrath

Quantization in PyTorch 2.0 Export at PyTorch Conference 2022

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)

Deep Dive: Quantizing Large Language Models, part 1

PyTorch vs TensorFlow | Ishan Misra and Lex Fridman

54 - Quantization in PyTorch | Mixed Precision Training | Deep Learning | Neural Network

Quantization in Deep Learning (LLMs)

Leaner and Greener AI with Quantization in PyTorch - SURAJ SUBRAMANIAN

Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan

PyTorch Quick Tip: Mixed Precision Training (FP16)

PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo

Lecture 05 - Quantization (Part I) | MIT 6.S965

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

TorchScript and PyTorch JIT | Deep Dive

PyTorch 2.0: Unlocking the Power of Deep Learning with the Torch Compile API - Christian Keller

VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained

Quantization - Dmytro Dzhulgakov

How to statically quantize a PyTorch model (Eager mode)

PyTorch 2.0 Ask the Engineers Q&A Series: Deep Dive into TorchInductor and PT2 Backend Integrati...

Deep Dive: PyTorch 2.0 on Graviton- AWS Online Tech Talks

9.2 Quantization aware Training - Concepts

Lecture 7/A Quantization in PyTorch, , Computer Vision for Embedded Systems

New course with Hugging Face: Quantization in Depth 🤗