QLoRA: Quantization for Fine Tuning

preview_player
Показать описание
Like 👍. Comment 💬. Subscribe 🟥.

#quantization #finetuning #machinelearning #gpu
Рекомендации по теме
Комментарии
Автор

I rewatch almost the entire live stream. This helps me a lot understanding QLoRA. You explain almost every concepts in an easy-to-understand way. Please keep doing this paper reading series

SofieSimp
Автор

Wow. The recording of this livestream was just recommended to me. Your in-depth analysis of the paper, while also explaining the nuanced terms, was extremely informational and engaging. As a new entrant to the field, I learned several new concepts in this video, and I’m only 30 minutes in! Keep it up hu-po 👍

jmoney
Автор

Data size vs. quality: I can model a complex industrial process using only 18 rows of 10 floating point values. People presume big models = big data. That's flatly untrue. It pays to know what you are doing. Great video, thanks. No wonder I like the video. I'm a ChE too :)

angstrom
Автор

this is now my favorite YouTube channel

Ronenlahat
Автор

Is it like the base model is stored in 4bit and as the data (X vector) passes through the layer that layer is first dequantized and then the matrix multiplication is done (X*W)? And the same thing for LoRA as well? and after we get Y (by adding output of lora and base layer) the W and LoRA layers are again quantized back to 4bit? and Y is passed on to next layer?
Also, if the LoRA is at the base of the model, does that mean to update the parameters of this LoRA we need to calculate the gradients of loss wrt all the W and LoRA matrices above it?

pravingaikwad
Автор

How does the each block of QLoRA after blockwise quantization maintain normal distribution when they are subset of normal distribution?

agiagiagitk
Автор

This is the best youtube channel to learn about AI !!!

wecreatestoryai
Автор

Can you do a "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers" paper explain? QLoRA is good for training LLMs but it's inference phase is dramatical slow. I heard GPTQ can speed up in the inference phase with its quantization algorithm, can we apply GPTQ with QLoRA then?

SofieSimp
Автор

The QLoRA paper stated that they used NF4 to store the model, and de-quantize back to BF16 when perform matrix multiplication. Doesn't this increase the VRAM usage of the model?

SofieSimp
Автор

Can someone clarify this: When you are considering some of the weights of llm as lora weights. Are you randomly selecting the weights?

nishanthk
Автор

fyi there are gpus like RTX A6000 that has 48GB memory and is considered a workstation gpu

majidalfifi
Автор

What are the quantization constants? They are just the absmax outlier of the 64 item batch or is this wrong? Maybe someone can point this out in a sentence or two.

christopherhornle
Автор

i have a question if i have 8vram what max seize model of gpt would train, not fine tune

khaledbouzaiene
Автор

Thanks for sharing. The invitation of discord is outdated, can you renew it? 😄

kaneyxx
Автор

Where do you attach the LoRA adapters? Just at the last layer?

wryltxw
Автор

34:37 OASST1 is a crowdsourced dataset of instructions and responses created by humans roleplaying both the LLM and the Human user sides. Everybody, please contribute to the OpenAssistant project!!

MouliSankarS
Автор

does the gradients also need to fit in the memory besides model size and batch?

wryltxw
Автор

How to finetune the model, I tried it with mpt-7b model in 16GB T4 instance but couldn't able to train that. Did you tried finetuning?

gowtham
Автор

Man, cna not even consider all the errors in logic. Ig isobvious that paged optimizersare there to handle memory spiked- IT SAYS SO. at 15:31 you blabber about how they contradict- they do not. They say that Chat GPT-4 is good at evaluating answers, but that the CURRENT BENCHMARKS ARE NOT GOOD ENOUGH FOR A GOOD EVALUATION. So, take a text and GPT 4 makes a good evaluation, but the standard benchmarks are not good enough. Stopping here - some people should maybe go to school and work on understanding english.

ThomasTomiczek