PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

Показать описание

Your GPU has not enough memory to fine-tune your LLM or AI system? Use HuggingFace PEFT: There is a mathematical solution to approximate your complex weight tensors in each layer of your self-attention transformer architecture with an eigenvector and eigenvalue decomposition, that allows for a minimum memory requirement on your GPU / TPU.

The HuggingFace PEFT library stands for parameter-efficient fine-tuning of transformer models (LLM for language, Stable Diffusion for images, Vision Transformer for vision) for reduced memory size. And one method of PEFT is LoRA: Low-rank Adaptation of LLMs.

Combined with setting the pre-trained weights to non-trainable and maybe even consider a 8bit quantization of your pre-trained LLM model parameters, a reduced memory footprint of adapter-tuned transformer based LLM models achieves SOTA benchmarks, compared to classical fine-tuning of Large Language Models (like GPT, BLOOM, LLama or T5).

In this video I explain the method in detail: AdapterHub and HuggingFace's new PEFT library focus on parameter-efficient fine-tuning of transformer models (LLM for language, Stable Diffusion for images, Vision Transformer for vision) for reduced memory size.

One method, Low-rank Adaptation, I explain in detail for an optimized LoraConfig file when adapter-tuning INT8 quantization models, from LLMs to Whisper.

Follow up video: 4-bit quantization QLoRA explained and with colab Notebook:

#ai
#PEFT
#finetuning
#finetune
#naturallanguageprocessing
#datascience
#science
#technology
#machinelearning

Discover AI

Рекомендации по теме

Комментарии

Most underrated channel on YT. Deserves a million subs. Thanks.

Blessed-by-walking-show

Great video that doesn't hand-wave away the mathematical and implementation details. Exactly the kind of content I love. Thank you!

jett_royce

I'm having so many flashbacks from my PCA classes😅, you explain much better than my teacher btw...

Shionita

BEST video on PEFT and LORA..I was not able to undertsand the concepts from other videos, by searching more on YT I land up on this video and I understood whole concepts. JUST

ad_academy

It doesn't get better than this ❤

sklnow

Outstanding video, the best one I have seen on LoRA!

I have one question about the SVD decomposition procedure:
A full fine tune of a large model such as LLaMa would require loading the entire model tensors onto the GPU and adjusting them by delta(phi) for all the parameters.
In LoRA, delta(phi) is replaced with 2 smaller SVD matrices that are trained and then multiplied back into the full size and added to the original parameters.

My question is this.
When you generate the 2 smaller SVD matrices, you still need to load in the full size tensor to then decompose. In PEFT, are the 2 SVD matrices calculated once at the beginning for all the different tensors before fine tuning occurs? Also how is it possible to backpropagate through the 2 smaller matrices without combining them back together on the GPU?

ianmatejka

This is so well done. You inspired me to create similar content as well. Hats off!

bhavulgauri

Best video on LORA ever! Simply can't get better than this 🏆

suryanshsinghrawat

Thank you for explaining. I previously believed that LoRa was a stable diffusion generating beauty,

changtimwu

Best video and explanation on LoRA, thank you for your efforts!

uraskarg

This is really awesome... you nailed it. Such explanation can only come from deep understanding. Thank you very much..

SandeepGupta

Superb video. Excellent presentation of all the concepts and easy to understand. You have a great teaching style sir.

ricosrealm

Absolutely brilliantly explained. Love this guy's style of teaching and his casual humour. Do we have to drop to int8 for PEFT?

BradleyKieser

Superb video :) Very clear and concise explanation. Thank you.

ruchaapte

very clear explaination on low-rank and LLM

steventan

According to the study in the video, does Lora really achieve better results than full fine tuning of all parameters of the entire model? Does this mean that it is not only less demanding on computing resources, but also has better performance?

Rman

This is Gold. Thanks for this amazing content.

ArgenisLeon

This was an amazing explanation. Thank you.

Philip

what tool r u using for the presentation i love the smooth transitions

ko-Daegu

outstanding video. Brilliantly explained complex topics.. I have one question, Can we do Lora to multi modal architectures like Donut which is a combination of Swing transformer + Bard ? Any pointers to do this

paturibharath

PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

Fine-tuning LLMs with PEFT and LoRA

LoRA explained (and a bit about precision and quantization)

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

LoRA & QLoRA Fine-tuning Explained In-Depth

PEFT w/ Multi LoRA explained (LLM fine-tuning)

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

LoRA & QLoRA Explained In-Depth | Finetuning LLM Using PEFT Techniques

Part 1-Road To Learn Finetuning LLM With Custom Data-Quantization,LoRA,QLoRA Indepth Intuition

LLM2 Module 2 - Efficient Fine-Tuning | 2.3 PEFT and Soft Prompt

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Part 2-LoRA,QLoRA Indepth Mathematical Intuition- Finetuning LLM Models

Finetune LLM using lora | Step By Step Guide | peft | transformers | tinyllama

Efficient Large Language Model training with LoRA and Hugging Face PEFT

Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)

Fine-tuning LLMs with PEFT and LoRA - Gemma model & HuggingFace dataset

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Fine-tune LLama2 w/ PEFT, LoRA, 4bit, TRL, SFT code #llama2

LoRA: Low-Rank Adaptation of LLMs Explained

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

LoRA Explained

PEFT Fine Tuning - Parameter Efficient Fine Tuning Methods