Quantization in Deep Learning (LLMs)

Показать описание

This video is about quantization in deep learning in the deep learning tutorial series.

Quantization is getting more and more popular and essential to deal with the ever-growing deep learning models. But how does quantization work? What are the different types of quantization algorithms? What are the different models of quantization? I have tried to answer these questions in this video.

Topics covered include Types of Quantization - Uniform and Non-Uniform quantization, and further divisions of Uniform quantization such as symmetric and asymmetric quantization, dequantization, choosing the scale factor and zero point parameters for both symmetric and asymmetric quantization. Lastly, Post-training quantization or PQT and Quantization Aware Training or QAT are also covered.

A practical guide to neural network quantization both in PyTorch and TensorFlow is to follow.

As always, hope it's useful!

RELATED LINKS

AI BITES LINKS

🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠

📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚

WHO AM I?
I am a Machine Learning Researcher / Practioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years back. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.

#machinelearning #deeplearning #aibites

Рекомендации по теме

Комментарии

Thanks for this clear and easy explanation of Quantization in NNs.

MojtabaJafaritadi

A good comprehensive work. I liked the related links you added in description. A tiny recommendation: Please increase the speed of speaking as there were many secs breaks between topics. Thank you and looking forward for more content.

mashood

Thankyou, it was really clean and clear explanation that too in a short time. 👏

SreeramAjay

Thank you for the informative content. Is it possible to combine pruning and quantization while maintaining accuracy?

abuali

very good explanation, please make a video on how to calibrate the data and compute scaling factor and zero point by analysing the weight distribution of each layer for Int8 quantization in tensorflow tensorrt, also the role of fake quantizers during backpropagation

Techiiot

I just have a question about the quantisation on tensorflow. For a project of mine i used QKeras library for QTA, the weights that i got in the end the were pretty large numbers (here speaking more about the volume like for example 0.235215266523415e-2). On the qunationzation config i used int8 and that number is not representable for int8 format.

Does the tranining still happen in fp32 but the quantisation is treated as noise?
Also what do i do to get the weights to be representable in int8 format?
How to test the accurracy of the weight quantised model?

mdnghbrs

Quantization in Deep Learning (LLMs)

Quantization in Deep Learning (LLMs)

Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

What is LLM quantization?

Understanding: AI Model Quantization, GGML vs GPTQ!

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Part 1-Road To Learn Finetuning LLM With Custom Data-Quantization,LoRA,QLoRA Indepth Intuition

LoRA explained (and a bit about precision and quantization)

QLoRA Explained: Making Giant AI Models

LLMs Quantization Crash Course for Beginners

Lecture 05 - Quantization (Part I) | MIT 6.S965

tinyML Talks: A Practical Guide to Neural Network Quantization

Downsizing Neural Networks by Quantization - Introduction to Deep Learning

AWQ for LLM Quantization

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

SmoothQuant

QLoRA - Efficient Finetuning of Quantized LLMs

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers

LLM Explained | What is LLM

The Era of 1-bit LLMs by Microsoft | AI Paper Explained

How To CONVERT LLMs into GPTQ Models in 10 Mins - Tutorial with 🤗 Transformers