GTC 2021: Systematic Neural Network Quantization

preview_player
Показать описание
An important next milestone in machine learning is to bring intelligence at the edge without relying on the computational power of the cloud. This could lead to more reliable, lower latency, and privacy preserving AI for a wide range of applications. However, state-of-the-art NN models often require prohibitive amounts of compute, memory, and energy resources for edge deployment. To address these challenges, I will present our latest work on hardware-aware quantization that achieve optimal tradeoff between accuracy, latency, and model size. In particular, I will discuss HAWQV3, which is a new second-order quantization method where the entire inference can be performed with integer-only arithmetic and without any floating point operations.

Related papers are:
- A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630.

- HAWQV3: Dyadic neural network quantization. ICML, 2021.

-HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. NeurIPS, 2020.

- HAWQ: Hessian AWare quantization of neural networks with mixed-precision. ICCV, 2019.

- I-BERT: Integer-only BERT quantization. ICML, 2021.

- Q-BERT: Hessian based ultra low precision quantization of BERT. AAAI, 2020.
Рекомендации по теме
Комментарии
Автор

sorry! would positive 7 take up as many bits as negative 5 ? You'd be skewing/compressing the positive numbers. In order not to, surely you'd have to move '0'. !

NisseOhlsen
Автор

12:47 HAWQ (Hessian Aware Quantization)

shuchangzhou