GTC 2021: Systematic Neural Network Quantization

Показать описание

An important next milestone in machine learning is to bring intelligence at the edge without relying on the computational power of the cloud. This could lead to more reliable, lower latency, and privacy preserving AI for a wide range of applications. However, state-of-the-art NN models often require prohibitive amounts of compute, memory, and energy resources for edge deployment. To address these challenges, I will present our latest work on hardware-aware quantization that achieve optimal tradeoff between accuracy, latency, and model size. In particular, I will discuss HAWQV3, which is a new second-order quantization method where the entire inference can be performed with integer-only arithmetic and without any floating point operations.

Related papers are:
- A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630.

- HAWQV3: Dyadic neural network quantization. ICML, 2021.

-HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. NeurIPS, 2020.

- HAWQ: Hessian AWare quantization of neural networks with mixed-precision. ICCV, 2019.

- I-BERT: Integer-only BERT quantization. ICML, 2021.

- Q-BERT: Hessian based ultra low precision quantization of BERT. AAAI, 2020.

Рекомендации по теме

Комментарии

sorry! would positive 7 take up as many bits as negative 5 ? You'd be skewing/compressing the positive numbers. In order not to, surely you'd have to move '0'. !

NisseOhlsen

12:47 HAWQ (Hessian Aware Quantization)

shuchangzhou

GTC 2021: Systematic Neural Network Quantization

GTC 2021: Systematic Neural Network Quantization

Advanced Machine Learning with Neural Networks 2021 - Class 8 - Quantization and pruning

Downsizing Neural Networks by Quantization - Introduction to Deep Learning

Deep neural network calibration for E2E speech recognition system - (longer introduction)

Hessian AWare Quantization V3: Dyadic Neural Network Quantization

Watch GTC 21 Get Free Deep Learning Courses

ESWEEK 2021 Education - Neural Networks and Accelerator Co-Design

tinyML Talks: A Practical Guide to Neural Network Quantization

Quantization in Deep Learning (LLMs)

Training Quantized Neural Networks With a Full-Precision Auxiliary Module

tinyML Research Symposium 2021: Quantization-Guided Training for Compact TinyML Models

tinyMLSummit 2021 Qualcomm Tutorial: Advanced network quantization and compression through the AIMET

Neural Network Atomic Data Compression and Simulation

AdaBits: Neural Network Quantization With Adaptive Bit-Widths

Verifying Low-dimensional Input Neural Networks via Input Quantization

[HPCA'21] Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

[TECHCON'20] Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelera...

Inside NVC++ and NVFORTRAN - Bryce Adelstein Lelbach - GTC 2021

ICLR Paper: Learn Step Size Quantization

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained...

Introduction to Deep Learning for Edge Devices Session 3: Quantization

A12 Presentation - Quantization of Neural Networks

[REFAI Seminar 08/31/21] Efficient AI via Extreme Network Quantization and Binarization

DAC 2020 30.3 - Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach