PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

preview_player
Показать описание
Your GPU has not enough memory to fine-tune your LLM or AI system? Use HuggingFace PEFT: There is a mathematical solution to approximate your complex weight tensors in each layer of your self-attention transformer architecture with an eigenvector and eigenvalue decomposition, that allows for a minimum memory requirement on your GPU / TPU.

The HuggingFace PEFT library stands for parameter-efficient fine-tuning of transformer models (LLM for language, Stable Diffusion for images, Vision Transformer for vision) for reduced memory size. And one method of PEFT is LoRA: Low-rank Adaptation of LLMs.

Combined with setting the pre-trained weights to non-trainable and maybe even consider a 8bit quantization of your pre-trained LLM model parameters, a reduced memory footprint of adapter-tuned transformer based LLM models achieves SOTA benchmarks, compared to classical fine-tuning of Large Language Models (like GPT, BLOOM, LLama or T5).

In this video I explain the method in detail: AdapterHub and HuggingFace's new PEFT library focus on parameter-efficient fine-tuning of transformer models (LLM for language, Stable Diffusion for images, Vision Transformer for vision) for reduced memory size.

One method, Low-rank Adaptation, I explain in detail for an optimized LoraConfig file when adapter-tuning INT8 quantization models, from LLMs to Whisper.

Follow up video: 4-bit quantization QLoRA explained and with colab Notebook:

#ai
#PEFT
#finetuning
#finetune
#naturallanguageprocessing
#datascience
#science
#technology
#machinelearning
Рекомендации по теме
Комментарии
Автор

Most underrated channel on YT. Deserves a million subs. Thanks.

Blessed-by-walking-show
Автор

Great video that doesn't hand-wave away the mathematical and implementation details. Exactly the kind of content I love. Thank you!

jett_royce
Автор

I'm having so many flashbacks from my PCA classes😅, you explain much better than my teacher btw...

Shionita
Автор

BEST video on PEFT and LORA..I was not able to undertsand the concepts from other videos, by searching more on YT I land up on this video and I understood whole concepts. JUST

ad_academy
Автор

It doesn't get better than this ❤

sklnow
Автор

Outstanding video, the best one I have seen on LoRA!

I have one question about the SVD decomposition procedure:
A full fine tune of a large model such as LLaMa would require loading the entire model tensors onto the GPU and adjusting them by delta(phi) for all the parameters.
In LoRA, delta(phi) is replaced with 2 smaller SVD matrices that are trained and then multiplied back into the full size and added to the original parameters.

My question is this.
When you generate the 2 smaller SVD matrices, you still need to load in the full size tensor to then decompose. In PEFT, are the 2 SVD matrices calculated once at the beginning for all the different tensors before fine tuning occurs? Also how is it possible to backpropagate through the 2 smaller matrices without combining them back together on the GPU?

ianmatejka
Автор

This is so well done. You inspired me to create similar content as well. Hats off!

bhavulgauri
Автор

Best video on LORA ever! Simply can't get better than this 🏆

suryanshsinghrawat
Автор

Thank you for explaining. I previously believed that LoRa was a stable diffusion generating beauty,

changtimwu
Автор

Best video and explanation on LoRA, thank you for your efforts!

uraskarg
Автор

This is really awesome... you nailed it. Such explanation can only come from deep understanding. Thank you very much..

SandeepGupta
Автор

Superb video. Excellent presentation of all the concepts and easy to understand. You have a great teaching style sir.

ricosrealm
Автор

Absolutely brilliantly explained. Love this guy's style of teaching and his casual humour. Do we have to drop to int8 for PEFT?

BradleyKieser
Автор

Superb video :) Very clear and concise explanation. Thank you.

ruchaapte
Автор

very clear explaination on low-rank and LLM

steventan
Автор

According to the study in the video, does Lora really achieve better results than full fine tuning of all parameters of the entire model? Does this mean that it is not only less demanding on computing resources, but also has better performance?

Rman
Автор

This is Gold. Thanks for this amazing content.

ArgenisLeon
Автор

This was an amazing explanation. Thank you.

Philip
Автор

what tool r u using for the presentation i love the smooth transitions

ko-Daegu
Автор

outstanding video. Brilliantly explained complex topics.. I have one question, Can we do Lora to multi modal architectures like Donut which is a combination of Swing transformer + Bard ? Any pointers to do this

paturibharath