QLoRA: Quantization for Fine Tuning

Показать описание

Like 👍. Comment 💬. Subscribe 🟥.

#quantization #finetuning #machinelearning #gpu

hu-po

Рекомендации по теме

Комментарии

I rewatch almost the entire live stream. This helps me a lot understanding QLoRA. You explain almost every concepts in an easy-to-understand way. Please keep doing this paper reading series

SofieSimp

Wow. The recording of this livestream was just recommended to me. Your in-depth analysis of the paper, while also explaining the nuanced terms, was extremely informational and engaging. As a new entrant to the field, I learned several new concepts in this video, and I’m only 30 minutes in! Keep it up hu-po 👍

jmoney

Data size vs. quality: I can model a complex industrial process using only 18 rows of 10 floating point values. People presume big models = big data. That's flatly untrue. It pays to know what you are doing. Great video, thanks. No wonder I like the video. I'm a ChE too :)

angstrom

this is now my favorite YouTube channel

Ronenlahat

Is it like the base model is stored in 4bit and as the data (X vector) passes through the layer that layer is first dequantized and then the matrix multiplication is done (X*W)? And the same thing for LoRA as well? and after we get Y (by adding output of lora and base layer) the W and LoRA layers are again quantized back to 4bit? and Y is passed on to next layer?
Also, if the LoRA is at the base of the model, does that mean to update the parameters of this LoRA we need to calculate the gradients of loss wrt all the W and LoRA matrices above it?

pravingaikwad

How does the each block of QLoRA after blockwise quantization maintain normal distribution when they are subset of normal distribution?

agiagiagitk

This is the best youtube channel to learn about AI !!!

wecreatestoryai

Can you do a "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers" paper explain? QLoRA is good for training LLMs but it's inference phase is dramatical slow. I heard GPTQ can speed up in the inference phase with its quantization algorithm, can we apply GPTQ with QLoRA then?

SofieSimp

The QLoRA paper stated that they used NF4 to store the model, and de-quantize back to BF16 when perform matrix multiplication. Doesn't this increase the VRAM usage of the model?

SofieSimp

Can someone clarify this: When you are considering some of the weights of llm as lora weights. Are you randomly selecting the weights?

nishanthk

fyi there are gpus like RTX A6000 that has 48GB memory and is considered a workstation gpu

majidalfifi

What are the quantization constants? They are just the absmax outlier of the 64 item batch or is this wrong? Maybe someone can point this out in a sentence or two.

christopherhornle

i have a question if i have 8vram what max seize model of gpt would train, not fine tune

khaledbouzaiene

Thanks for sharing. The invitation of discord is outdated, can you renew it? 😄

kaneyxx

Where do you attach the LoRA adapters? Just at the last layer?

wryltxw

34:37 OASST1 is a crowdsourced dataset of instructions and responses created by humans roleplaying both the LLM and the Human user sides. Everybody, please contribute to the OpenAssistant project!!

MouliSankarS

does the gradients also need to fit in the memory besides model size and batch?

wryltxw

How to finetune the model, I tried it with mpt-7b model in 16GB T4 instance but couldn't able to train that. Did you tried finetuning?

gowtham

Man, cna not even consider all the errors in logic. Ig isobvious that paged optimizersare there to handle memory spiked- IT SAYS SO. at 15:31 you blabber about how they contradict- they do not. They say that Chat GPT-4 is good at evaluating answers, but that the CURRENT BENCHMARKS ARE NOT GOOD ENOUGH FOR A GOOD EVALUATION. So, take a text and GPT 4 makes a good evaluation, but the standard benchmarks are not good enough. Stopping here - some people should maybe go to school and work on understanding english.

ThomasTomiczek

QLoRA: Quantization for Fine Tuning

LoRA & QLoRA Fine-tuning Explained In-Depth

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)

Mastering LLM Fine-Tuning with QLoRA: Quantization on a Single GPU + Code

QLoRA - Efficient Finetuning of Quantized LLMs

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

Part 1-Road To Learn Finetuning LLM With Custom Data-Quantization,LoRA,QLoRA Indepth Intuition

QLoRA: Quantization for Fine Tuning

Quantization in Fine Tuning LLM With QLoRA

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

QLoRA is all you need (Fast and lightweight model fine-tuning)

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

QLoRA: Efficient Finetuning of Quantized LLMs Explained

Transforming AI Responses: Fine-Tune Mistral-7B with QLoRA

Fine Tuning LLM Models – Generative AI Course

LoRA explained (and a bit about precision and quantization)

QLoRA: Efficient Finetuning of Quantized LLMs

Part 2-LoRA,QLoRA Indepth Mathematical Intuition- Finetuning LLM Models

Fine-tuning Language Models for Structured Responses with QLoRa

QLora Explained and Fine tuning on Phi-2 Tutorial (Quantized LORA)

QLORA: Efficient Finetuning of Quantized LLMs

'okay, but I want Llama 3 for my specific use case' - Here's how

QLoRA Is More Than Memory Optimization. Train Your Models With 10% of the Data for More Performance.