Understanding: AI Model Quantization, GGML vs GPTQ!

Показать описание

Learning Resources:

❤️ If you want to support the channel ❤️
Support here:

Рекомендации по теме

Комментарии

This series(back to basics) needs a boost. Love the way you explained all fundamentals. Keep them coming

explorer

This is such a good video but the number of likes/views don't reflect it. Thank you so much.

rupjitchakraborty

Always my go to channel to understand concepts clearly. Can't thankyou enough brother. 🙌

vivekraj

Well done mate. Thank you for your thorough and clear explanation.

vamp

I was exactly wondering how quantization works this morning. Thank you, such a good video 🎉

megamehdi

Great and informative video dude! Well done, and I always appreciate your content!

MaJetiGizzle

Great explanation about the differences about GGPQ and GGML, thanks once again!

echofloripa

Excellent and most accurate explanation. Thank you!

tarun

Wow, amazing video, everything was well explained and detailed, thanks!

luisxd

Wonderful explanation! Keep up the great content!

mokanin

Good work, great explanation. Thanks!

fredrik-ekelund

I knew the number of bits had to do with the accuracy and how powerful the hardware was required to run a LLM but beyond that i had no idea what it meant. Your explanation was super clear, so thanks.

MonkeySimius

Thanks mate for the great explanation!

Semion.

Thanks this is a nice video. Can GPTQ models run on apple metal framework? Also, I have seen some GGML models use CPU and GPU together. How is this different from the other approach?

harry

Never knew there was a different but now I know, thank you!

Nerdimo

Can you explain how to figure out what settings to use to run models in textui. such as transformer, qlora, i usually just end up trying every combination untill it works or i give up. And usualy no instructions on huggingface repo.

cdb

A GPTQ quantized model inherits from the nn.Module class in pytorch? How can I integrate a GPTQ model with my pytorch code?

aurkom

have you planed to do some more videos on gptq and ggml, where finetuning the quatized model or converting fp16 models to quantized model

im-notai

How can we utilize both GPU and CPU for training a model. Like somehow break the model and store half of it in CPU RAM and the other half in GPU RAM

kalilinux

Great explanation! I needed this...lol

sytekdd

Understanding: AI Model Quantization, GGML vs GPTQ!

Understanding: AI Model Quantization, GGML vs GPTQ!

New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

GGML vs GPTQ in Simple Words

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

What is LLM quantization?

Quantize any LLM with GGUF and Llama.cpp

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

Difference Between GGUF and GGML

Revolutionizing Machine Learning: GGML's AI at the Edge

Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)

Quantization in Deep Learning (LLMs)

LoRA explained (and a bit about precision and quantization)

Run Code Llama 13B GGUF Model on CPU: GGUF is the new GGML

Faster Models with Similar Performances - AI Quantization

How to Quantize an LLM with GGUF or AWQ

How to Choose AI Model Quantization Techniques | AI Model Optimization with Intel® Neural Compressor...

ggml model format

ggerganov/ggml - Gource visualisation

Gemma|LLMstudio|Quantize GGUF |GGML |Semantic Kernel

GPTQ : Post-Training Quantization

Lecture 05 - Quantization (Part I) | MIT 6.S965

Updated Installation for Oobabooga Vicuna 13B And GGML! 4-Bit Quantization, CPU Near As Fast As GPU.

Quantization in PyTorch 2.0 Export at PyTorch Conference 2022