QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

preview_player
Показать описание

In this video, I discuss how to fine-tune an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral-7b-Instruct.

More Resources:

--

Socials

The Data Entrepreneurs

Support ❤️

Intro - 0:00
Fine-tuning (recap) - 0:45
LLMs are (computationally) expensive - 1:22
What is Quantization? - 4:49
4 Ingredients of QLoRA - 7:10
Ingredient 1: 4-bit NormalFloat - 7:28
Ingredient 2: Double Quantization - 9:54
Ingredient 3: Paged Optimizer - 13:45
Ingredient 4: LoRA - 15:40
Bringing it all together - 18:24
Example code: Fine-tuning Mistral-7b-Instruct for YT Comments - 20:35
What's Next? - 35:22
Рекомендации по теме
Комментарии
Автор

Your explanations are amazing and the content is great. This is the best playlist on LLMs on YouTube.

manyagupta
Автор

Amazing work Shaw - complex concepts broken down to 'bit-sized bytes' for humans. Appreciate your time & efforts :)

chris_zazzman
Автор

This is the best explanation that i've ever heard, thanks for all the work!!

MrCancerbero
Автор

wow, you are the genius of explaining super hard math concept into layman understandable terms with good visual representation. Keep it coming.

soonheng
Автор

Thank you Shaw for yet another awesome video succinctly explaining complex topics!

africanbuffalo
Автор

Thank you for this amazing video, great explanations, very clear and easy to understand!

liubovnesterenko
Автор

So far the best explanation on Youtube about this topic

Ali-metv
Автор

Exactly what I was looking for! Thanks for the video. Keep going!

RohitJain-lsov
Автор

Amazing video ! You are the best, man ! Thank you so much.

bim-techs
Автор

Great video and your slides are very well organized!

el_artmaga_
Автор

Learned a lot. Great video and very accessible. Well Done!

Автор

Loved this, very informative and clear!

aldotanca
Автор

Amazing explanation!!! Thank you Shaw!

aisme
Автор

thank u for sharing this knowledge, we need more videos like this

younespiro
Автор

First I thought omg this video is horrible but its actually excellent! (I wanted a practical fast way to get my LLM finetuned using my own data, but found it really isnt that easy). After this I understood a lot better what is going on in the background.

operitivo
Автор

dear Shaw, i listen to the video so many times, and aside that is extremely well done and i learn so much, you should emphasize (or even do an ad hoc video) the fact that key for the finetuning with "one" gpu is the usage of the "quantifized" model of mistral, overall i m sure that many users, wodul like to know more about this models, i m sure that not many knows how to use the most important LMM (quantized) on their own colab or even base of their own application.... :)

FrancescoFiamingo
Автор

Thank you for this great video! If you find a way to get this working on Apple silicon machines, we would love to see a video about it!

trsd
Автор

Thank you for sharing this fantastic video! Would it be worthwhile to explore a similar approach using unsupervised learning?

Eliot-nrzq
Автор

Beautifully explained, thanks!!!
When you said, for PEFT "we augment the model with additional parameters that are trainable", how do we add these parameters exactly? Do we add a new layer?
Also, when we say "%trainable parameters out of total parameters", doesn't that mean that we are updating a certain % of original parameters?

pawan
Автор

Great video and explanation! Thanks a lot. For the code, have you tried to use:

from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,

)

and then add that as quantization configs when loading the model? This would include the other aspects from the QLoRA paper, no?

ahmadalhineidi