Quantization in Deep Learning (LLMs)

preview_player
Показать описание
This video is about quantization in deep learning in the deep learning tutorial series.

Quantization is getting more and more popular and essential to deal with the ever-growing deep learning models. But how does quantization work? What are the different types of quantization algorithms? What are the different models of quantization? I have tried to answer these questions in this video.

Topics covered include Types of Quantization - Uniform and Non-Uniform quantization, and further divisions of Uniform quantization such as symmetric and asymmetric quantization, dequantization, choosing the scale factor and zero point parameters for both symmetric and asymmetric quantization. Lastly, Post-training quantization or PQT and Quantization Aware Training or QAT are also covered.

A practical guide to neural network quantization both in PyTorch and TensorFlow is to follow.

As always, hope it's useful!

RELATED LINKS

AI BITES LINKS

🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠

📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚

WHO AM I?
I am a Machine Learning Researcher / Practioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years back. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.

#machinelearning #deeplearning #aibites
Рекомендации по теме
Комментарии
Автор

Thanks for this clear and easy explanation of Quantization in NNs.

MojtabaJafaritadi
Автор

A good comprehensive work. I liked the related links you added in description. A tiny recommendation: Please increase the speed of speaking as there were many secs breaks between topics. Thank you and looking forward for more content.

mashood
Автор

Thankyou, it was really clean and clear explanation that too in a short time. 👏

SreeramAjay
Автор

Thank you for the informative content. Is it possible to combine pruning and quantization while maintaining accuracy?

abuali
Автор

very good explanation, please make a video on how to calibrate the data and compute scaling factor and zero point by analysing the weight distribution of each layer for Int8 quantization in tensorflow tensorrt, also the role of fake quantizers during backpropagation

Techiiot
Автор

I just have a question about the quantisation on tensorflow. For a project of mine i used QKeras library for QTA, the weights that i got in the end the were pretty large numbers (here speaking more about the volume like for example 0.235215266523415e-2). On the qunationzation config i used int8 and that number is not representable for int8 format.

Does the tranining still happen in fp32 but the quantisation is treated as noise?
Also what do i do to get the weights to be representable in int8 format?
How to test the accurracy of the weight quantised model?

mdnghbrs