Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

preview_player
Показать описание

In this video, you'll learn how to of fine-tuning the Falcon 7b LLM (40b version is #1 on the Open LLM Leaderboard) on a custom dataset using QLoRA. The Falcon model is free for research and commercial use. We'll use a dataset consisting of Chatbot customer support FAQs from an ecommerce website.

Throughout the video, we'll cover the steps of loading the model, implementing a LoRA adapter, and conducting the fine-tuning process. We'll also monitor the training progress using TensorBoard. To conclude, we'll compare the performance of the untrained and trained models by evaluating their responses to various prompts.

00:00 - Introduction
01:43 - Falcon LLM
04:18 - Google Colab Setup
05:32 - Dataset
08:15 - Load Falcon 7b and QLoRA Adapter
12:20 - Try the Model Before Training
14:40 - HuggingFace Dataset
15:58 - Training
20:38 - Save the Trained Model
21:34 - Load the Trained Model
23:19 - Evaluation
28:53 - Conclusion

#chatgpt #gpt4 #llms #artificialintelligence #promptengineering #chatbot #transformers #python #pytorch
Рекомендации по теме
Комментарии
Автор

Wow, finally a working guide on how to finetune LLM's. Thank you very much 🙏

sithlordi
Автор

I just subscribed!! Your tutorials are straightforward and to the point. Love your content. Keep up with the amazing content! 🙌 ✨✨✨

thevitorialima
Автор

Hello Veneline can you please provide the colab notebook possible

LifeTravelerAmmu
Автор

For the tokenizer, I think we should set padding_side="left", because it is a causal llm. What do you think of it?

doguhws
Автор

Great video. Would the response times be faster with a better GPU?

ikjb
Автор

Nice video Venelin Valkov, I wanted to ask if I have an input size of 4k+ tokens can I train it on a single GPU?

amnasherafal
Автор

Please make a video on how to increase the inference speeds that is the major problem every one is facing

dataflex
Автор

Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?

mariocuezzo
Автор

is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?

ko-Daegu
Автор

I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?

ewldcyd
Автор

when adding new special token like <human> and <assistant> shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.

maidacundo
Автор

It generates the answer and then adds more questions and answers until max token limit is reached. What am I doing wrong? How does the model know when to stop? I check the generation config and both padding and eos are set.

tyfuoru
Автор

Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅

joaoalmeida
Автор

Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?

henkhbit
Автор

I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock

with torch.inference_mode():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)

Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using instead of since I got an error of 'cannot import name from 'peft'' even on the latest version of peft library

LinPure
Автор

why is the inference consistently slower? Do we know how to speed it up ?

sumitmamoria
Автор

A doubt a little out of the context of the video... are Deep Learning models as used as machine learning models in tabular data?

alyssonmach
Автор

Thanks for the great video, can we merge back the adapter.bin to it's original model ? can you make a video onit ?

cgeetmv
Автор

Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.

ggximenez
Автор

Hello, Great video so far. Let me ask some questions here:
1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ?
2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.

IchSan-jxeg