LoRA - Explained!

preview_player
Показать описание
A parameter efficient fine tuning technique that makes use of a low rank adapter to (1) reduce storage required per task by decreasing the number of trainable parameters added to the network per task (2) remove inference latency ensuring the stored parameters are applied to the existing network architecture instead of adding more

RESOURCES

ABOUT ME

PLAYLISTS FROM MY CHANNEL

MATH COURSES (7 day free trial)

OTHER RELATED COURSES (7 day free trial)

CHAPTERS
0:00 Introduction
1:49 Pass 1: Low Rank Matrices
8:00 Quiz 1
8:52 Pass 2: Adapters
16:38 Quiz 2
17:47 Pass 3: Low Rank Adapters
26:37 Quiz 3
27:54 Summary
Рекомендации по теме
Комментарии
Автор

I read the paper multiple times, watched some youtube videos and read some articles, you are the only who explained it this clearly, really like your multiple passes to make it easier to understand

mohammadalameen
Автор

Your explanations are easy to understand and in-depth at the same time. Thank you for making my life easier.

Mohamed_Shokry
Автор

I don't understand why you don't have much more views and engagement. Your videos are some of the best explanations out there. I've sent my students to your channel multiple times.

Not a great timeline where virality reins over veracity. Amazing work.

JorgeZentrik
Автор

That's a really hard subject to explain clearly. Nice job.

timdeboerz
Автор

This is the best on LoRA. Easy to understand explanation.

prabhakaranveeramani
Автор

I like more why rank is necessay, nice explanation

datascienceandaiconcepts
Автор

For anyone still struggling to understand. Here's a simple mathematical proof (not that complicated) of why this works.

We want to first pre-train a model without adding any LoRA mix to it, just create your un-trained model, train it and thats it. Now, if you want to fine-tune it, you can modify the structure of your current model to support LoRA matrices.

The objective is to fix the current trained weights W_0, and somehow learn a new weight matrix W_1, such that the model is fine-tuned on a downstream task. Remember, W_0 and W_1 describe the model, they are both matrices of the same size, and they describe the parameters of the model. It's just that W_1 is the new weights of your model.

Now, suppose W_1 is infact the new weight matrix. We may be interested to know the difference between the new weight matrix with the old one:

W_1-W_0

and we can call this difference Delta(W).

Thus Delta(W) = W_1 - W_0

Here's the fun part, we can write Delta(W) as the product of two smaller matrices (just like described in this video, and in the original LoRA paper).

Let AB = Delta(W)

now the question is, how do we exactly perform the forward pass when it comes time to actually fine tune your model? How exactly should the input "x" and output "h" (hidden state) be mapped?

Because, when we did not use LoRA during the pre-tuning stage, we were able to write any hidden state in the forward pass of any layer as: h = x*M
where M was the weight matrix of that specific layer.

Now that we are introducing these new matrices A and B, and we would like to have their product A*B represent effectively the change that the old matrix W_0 must undergo to get the new matrix W_1, how exactly should we compute h such that this behaviour holds? Here's the mathematical proof:

if we know that:
Delta(W) = W_1 - W_0

and we multiply any input x (a row vector in this case) on both sides of this equation, we will get:

x*Delta(W) = xW_1 - xW_0

now, if we rearrange:

xW_1=xW_0 + x*Delta(W)

The term (xW_1) is also h_1 (hidden state we would have gotten if we had the fine-tuned weight matrix W_1).

The term (xW_0) is also h_0 (hidden state we would have with our old weight matrix, aka our old model).

and the term x*Delta(W) is also equivalent to x*AB.

thus, during fine-tuning, the forward pass can be written as:

h1=xW_0 + x*AB

and now in your model you can freeze weight updates for W_0, and only update A and B. Then you can rest assured that after fine-tuning your model, A*B will indeed represent the difference that your old weight matrix must undergo to get the new weight matrix W_1. This is what's going on!

Edit:
And after you've trained and fine-tuned the model using LoRA forward-pass formula (as described above), you can simplify the forward pass step by combining matrix W_0 and matrix AB like this:

W_0 + AB

and this sum will in fact be the fine-tuned matrix W_1. Because by definition we defined:

Delta(W) = W_1-W_0

which we can rewrite as:

W_1=Delta(W) + W_0

and as we've defined above Delta(W) = AB

Thus: W_1 = AB + W_0

Thus here's the final proof of why adding the matices W_0 and AB is indeed the new matrix W_1.

Once we can be mathematically assured that W_1 is indeed the new fine-tuned matrix, in the inference stage (after all training and fine-tuning has finally finished) we will write all of the forward passes in any Linear layer as: h=xW_1

And remember, you apply LoRA across all of your layers that do matrix multiplication. So your model must be coded such that it is easy to switch between lora mode and non-lora mode.

weekipi
Автор

Thank you for explaining. I've watched a couple of videos on your channel and they're all really cool! You help me understand the intuitions of each topic.

hegzom
Автор

In finetuning of LLM we have 2 options.

1) change the parameter of actual Base model. But this require High resource and time.
2) Add new layers and change the architecture of the model. In finetuning only change the weight of this additional layer and Base model remain frozen. In inferencing we use both Base model and this additional layer.

LoRA helps us in reducing this additional layer by using Low Rank Matrices.

This is my knowledge. I want to please react on it So I can Verify my knowledge!😊

KhushPatel-xn
Автор

Great explanation! Thanks for breaking it down so clearly!

humayunahmadrajib
Автор

The quizzes aren't well connected to the content. Heck if you could add a timestamp after each quiz of "if you got this wrong, check out this timestamp" that would be helpful

pauljones
Автор

Thank you for the helpful video, I like it a lot

hainguyenthien
Автор

I like simple methods yet extremely effective

shisoy
Автор

I learned so much that I subscribed and turned on the notification bell

adekoyasamuel
Автор

Awesome explaination! I have few questions though:
1) At 24:00, you said we can do some matrix multiplication and addition to update the value of Wq so that the fine tuned information gets kinda infused in Wq which inturn allowed us to have faster inference time, but won't that hurt the performance in comparision to the case where we don't update Wq and keep A and B? Are we just trading performance for inference speed?
2) what if we do the same 'update Wq' part with additive adapters? That will also speed up their inference time?

harshsharma
Автор

Quiz 1 : answer A - rank 0
since all the rows or columns are linearly dependant
but i thinks its B: rank1

balubalaji
Автор

Custom GPTs or Gemini Gems are pretty spot on after you get good at making them. I would play around with these before building an AI agent with LangChain and vector embeddings.

canygard
Автор

LoRAs are the biggest thing to come out of AI since the transformer

isaiahcastillo
Автор

Cursor with claude 3.5 or o1 mini is great. Use their shortcuts to save time. Still struggles with new languages and frameworks though

pauljones
Автор

Amazing, thank you. Can u do one for latent diffusion

programming-short-videos
welcome to shbcf.ru