QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

Показать описание

In this video, I discuss how to fine-tune an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral-7b-Instruct.

More Resources:

--

Socials

The Data Entrepreneurs

Support ❤️

Intro - 0:00
Fine-tuning (recap) - 0:45
LLMs are (computationally) expensive - 1:22
What is Quantization? - 4:49
4 Ingredients of QLoRA - 7:10
Ingredient 1: 4-bit NormalFloat - 7:28
Ingredient 2: Double Quantization - 9:54
Ingredient 3: Paged Optimizer - 13:45
Ingredient 4: LoRA - 15:40
Bringing it all together - 18:24
Example code: Fine-tuning Mistral-7b-Instruct for YT Comments - 20:35
What's Next? - 35:22

Рекомендации по теме

Комментарии

Your explanations are amazing and the content is great. This is the best playlist on LLMs on YouTube.

manyagupta

Amazing work Shaw - complex concepts broken down to 'bit-sized bytes' for humans. Appreciate your time & efforts :)

chris_zazzman

This is the best explanation that i've ever heard, thanks for all the work!!

MrCancerbero

wow, you are the genius of explaining super hard math concept into layman understandable terms with good visual representation. Keep it coming.

soonheng

Thank you Shaw for yet another awesome video succinctly explaining complex topics!

africanbuffalo

Thank you for this amazing video, great explanations, very clear and easy to understand!

liubovnesterenko

So far the best explanation on Youtube about this topic

Ali-metv

Exactly what I was looking for! Thanks for the video. Keep going!

RohitJain-lsov

Amazing video ! You are the best, man ! Thank you so much.

bim-techs

Great video and your slides are very well organized!

el_artmaga_

Learned a lot. Great video and very accessible. Well Done!

Loved this, very informative and clear!

aldotanca

Amazing explanation!!! Thank you Shaw!

aisme

thank u for sharing this knowledge, we need more videos like this

younespiro

First I thought omg this video is horrible but its actually excellent! (I wanted a practical fast way to get my LLM finetuned using my own data, but found it really isnt that easy). After this I understood a lot better what is going on in the background.

operitivo

dear Shaw, i listen to the video so many times, and aside that is extremely well done and i learn so much, you should emphasize (or even do an ad hoc video) the fact that key for the finetuning with "one" gpu is the usage of the "quantifized" model of mistral, overall i m sure that many users, wodul like to know more about this models, i m sure that not many knows how to use the most important LMM (quantized) on their own colab or even base of their own application.... :)

FrancescoFiamingo

Thank you for this great video! If you find a way to get this working on Apple silicon machines, we would love to see a video about it!

trsd

Thank you for sharing this fantastic video! Would it be worthwhile to explore a similar approach using unsupervised learning?

Eliot-nrzq

Beautifully explained, thanks!!!
When you said, for PEFT "we augment the model with additional parameters that are trainable", how do we add these parameters exactly? Do we add a new layer?
Also, when we say "%trainable parameters out of total parameters", doesn't that mean that we are updating a certain % of original parameters?

pawan

Great video and explanation! Thanks a lot. For the code, have you tried to use:

from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,

)

and then add that as quantization configs when loading the model? This would include the other aspects from the QLoRA paper, no?

ahmadalhineidi

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

QLoRA is all you need (Fast and lightweight model fine-tuning)

LoRA & QLoRA Fine-tuning Explained In-Depth

Part 1-Road To Learn Finetuning LLM With Custom Data-Quantization,LoRA,QLoRA Indepth Intuition

Fine-tuning Large Language Models (LLMs) | w/ Example Code

QLoRA Explained: Making Giant AI Models

Fine-tuning Language Models for Structured Responses with QLoRa

How To Fine Tune Your Own AI (guancano style) Using QLORA And Google Colab (tutorial)

Finetune LLAMA2 on custom dataset efficiently with QLoRA | Detailed Explanation| LLM| Karndeep Singh

Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

How to Fine-Tune Open-Source LLMs Locally Using QLoRA!

🐐Llama 2 Fine-Tune with QLoRA [Free Colab 👇🏽]

Part 2-LoRA,QLoRA Indepth Mathematical Intuition- Finetuning LLM Models

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

FREE LLM fine-tuning with QLORA

How to Improve your LLM? Find the Best & Cheapest Solution

QLoRA - Efficient Finetuning of Quantized LLMs

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide