Finetune LLAMA2 on custom dataset efficiently with QLoRA | Detailed Explanation| LLM| Karndeep Singh

preview_player
Показать описание
This video describes the step-by-step, detailed explanation on how to finetune llama 2 on a single GPU efficiently with QLoRA. Following are the topics covered in the video:
1. Overview of Supervised fine-tuning and RLHF
2. Why is Finetuning required?
3. What is LoRA and how is it helpful?
4. How to prepare an Instruct Dataset for finetune generative models?
5. How to train the LLAMA 2 model using Instruct Dataset?
6. Inference using fine-tuned LLAMA 2 Model.

Connect with me on :

Background Music:
Creative Commons Attribution 3.0 Unported License

#llms #llama2 #finetune #qlora #huggingface
Рекомендации по теме
Комментарии
Автор

best video on fine tuning stuck for the whole day....until i got this👍

HamzaKhan-zjdn
Автор

Nice Explanation Dude. Looking For Something Like That

devanshumishra
Автор

Don't use music in th bg while teaching please. The video is awesome thou. Just that sometimes it breaks concentration.

avikpathak
Автор

This is the best video on fnetuning, thank u so much, you saved me so much headache.

ballerzhighlights
Автор

Thank you very much for sharing. I will follow your video to try fine-tuning with my own dataset.

pillargauss
Автор

great video with great explanation. thanks for the quality content. keep doing these kinds of videos and help us learn about llms more.

vignesh
Автор

🎯 Key Takeaways for quick navigation:

00:00 The *video covers fine-tuning llama2 model and introduces QLoRA for efficient fine-tuning on a single GPU.*
01:08 Fine-tuning *is crucial for generative models like llama2, especially in specialized domains such as medicine, where pre-trained models may lack specific knowledge.*
02:17 In-context *learning involves interacting with a model through prompts to extract domain-specific information from its knowledge base.*
03:23 Fine-tuning *is necessary when the domain poses challenges for a model; for instance, medical domains may require adapting models to understand technical terms.*
04:44 The *tutorial outlines two main steps for fine-tuning: Supervised Fine Tuning (SFT) and Reinforcement Learning Human Feedback (RLHF).*
05:54 The *initial step involves pre-training a model on general language data before fine-tuning it for a specific domain.*
07:30 In *SFT, domain-specific data is used to instruct the model with context-response pairs, adapting it to understand and generate specific content.*
08:51 The *tutorial focuses on SFT, providing a comprehensive understanding of how to construct an instruction dataset and fine-tune a llama2 model.*
09:30 Dependencies *for the tutorial include Hugging Face datasets, Accelerated, and TRL Library, along with specialized libraries like QLoRA for efficient fine-tuning.*
11:21 The *tutorial uses a dialogue dataset from Hugging Face, preparing it into an instruction dataset for fine-tuning llama2.*
14:18 The *data is processed, and 500 rows are selected for training, while 50 samples each are taken for testing and validation.*
15:10 The *tutorial introduces a quantized version of the llama2 base model using bits and bytes to reduce the model size for efficient training.*
18:10 Before *fine-tuning, a zero-shot inference test is performed on the base model, revealing its limitations in generating relevant content without fine-tuning.*
20:57 QLoRA *(Low Rank Adaptation) is introduced as a method for fine-tuning specific model parameters without modifying all 7 billion parameters, allowing efficient adaptation to new tasks.*
22:47 The *low-rank adaptation process involves creating an "adapter" matrix, fine-tuning it, and merging the changes with the original matrix to achieve task-specific adaptation.*
23:01 Laura *involves creating a new matrix for specific parameters in a model, fine-tuning it, and merging it with the original weights during training.*
25:17 To *prepare a model for Laura-based fine-tuning, enable gradient checkpoint, prepare it for KB training, and use a function to understand the parameters to fine-tune.*
27:22 Laura *configuration involves specifying the rank, Laura Alpha (decompositions), and the target model (e.g., query key value) for fine-tuning.*
29:27 Use *the Laura config to create additional matrices for specific target modules (e.g., query key value) and fine-tune them before merging with the original weights.*
35:27 Set *up training with specific arguments, use a specialized Adam optimizer, and employ a cosine learning schedule. Utilize the TRL library's `sftt` trainer for efficient training.*
38:46 When *saving the model after training, only the additional adapter weights created by Laura are saved, not the entire base model.*
39:25 For *inferencing, import the Pepft model, Laura config, and tokenizer. Use them to merge the adapter weights with the base model for generating text.*
42:12 Laura *allows training different adapters for various tasks (e.g., summarization, translation) and efficiently merging them with a single base model, optimizing resource usage.*
44:04 After *inferencing, the trained model with merged adapters can be pushed to a repository or Hugging Face Hub for sharing and deployment.*

Made with HARPA AI

goldenhomerealestate
Автор

🎯 Key Takeaways for quick navigation:

00:00 [🚀] *شرح لضبط نموذج Llama2 بشكل فعّال باستخدام QLoRA واستخدام GPU الفردية.*
00:41 [💡] *أهمية الضبط للنماذج الإنشائية في مجالات مثل الطب، وأهمية تعلم السياق لتحسين فهم النموذج.*
02:17 [🧠] *تفسير مفهوم التعلم في السياق واستخدام مجموعة متنوعة من المؤثرات لفهم المعلومات.*
03:11 [🔄] *أهمية الضبط في تحديث معرفة النموذج لفهم لغة المجال المحدد.*
04:05 [🔍] *نظرة عامة على الخطوات الأولية للضبط الفعّال باستخدام sft و rhf.*
06:22 [🎓] *شرح الضبط الفعّال بواسطة Supervised Fine Tuning (sft) وتجهيز مجموعة البيانات بالتعليم من البشر (rhf).*
08:51 [⚙️] *التركيز الأساسي على طريقة الضبط بالإشراف وعدم تفاصيل rhf.*
13:11 [📊] *نظرة عامة على تحضير مجموعة البيانات لتدريب النموذج الإنشائي.*
16:05 [💽] *الاعتماد على نموذج Llama2 بسعة 7 مليارات مع PePT وتقنية Laura للضبط الجزئي وتقليل مشاكل الذاكرة.*
17:56 [📏] *إجراء اختبار "صفري الإشارة" لفهم أداء النموذج قبل الضبط الدقيق باستخدام Laura.*
23:01 [🚗] *شرح ضبط موديل Llama 2 على وحدة معالجة الرسومات بكفاءة باستخدام QLoRA.*
23:42 [💡] *إنشاء مصفوفة جديدة لمعلمات محددة وضبطها ودمجها مع الأوزان الأصلية.*
29:27 [⚙️] *تكوين نموذج الطلب المستهدف في QLoRA باستخدام LoroConfig.*
35:54 [🛠️] *مراجعة إعدادات التدريب واستخدام TRL لتسهيل التدريب باستخدام sftt_trainer.*
39:00 [🔮] *استخدام النموذج المدرب للتنبؤات ودمج وزن QLoRA مع النموذج الأساسي.*
43:50 [🌐] *إمكانية استخدام النماذج المدربة مع QLoRA في تطبيقات إنتاجية للمهام المختلفة دون إعادة تحميل النموذج الأساسي.*

Made with HARPA AI/

goldenhomerealestate
Автор

This video should get much more views and likes

shrutiiyyer
Автор

all is great, but what was the need for background music?

AmitKumar-fnpx
Автор

Amazing!
Can you please help me in text generation instead of summary.

qbkilpe
Автор

Hey karndeep, I think, there is a mistake. You have already gotten a PEFT model and then you are again passing PEFT model with LORA config in SFT trainer. Its like LORA on LORA.
Am I right? You need to pass base model to SFT not PEFT model.

karthikdatta
Автор

Hi, Thank you for the detailed explanations. I have a question how we can push the tokenizer.Model after finetuned

manirajan__
Автор

Awesome video! I have a question.. Can you take a Qlora fine-tuned llama2-7b model and then quantize it using llama.cpp to run locally or on 1 gpu? I wonder whether quantization would eliminate the delta wights we learn using Qlora (as if you are just using the llama2-7b base)?

parisapouya
Автор

if we want to fine tune mistral: latest intead of llama2 here ? what should we use in the model section, i have downloaded ollama in my system how do i fine tune mistral:latest in that?

THE-AI_INSIDER
Автор

hi buddy i followed your this video "OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction" and got json file of my text present in images. now can you tell me how to get that text in to a txt file or docx file on anyother format u suggest where i can get the same structure of text like it was in the img. Also how to do that? like i tried my all possible ways but all was failures. Can you help me to get out of this problem? please its related to my fyp

mushafmughal
Автор

Superr and useful video on llama2 finetuning. I hav one doubt whether llama2 + sagemaker is less cost than azure openai...

VenkatesanVenkat-fdhg
Автор

One doubt, I have a finance data customer credit history, so can i make a prompt (question and answer) and i will also show the raw data their and what answers should agent give, can i train llama type llm models on this ??

rishabhmishra
Автор

Awesome video and underrated content. By the way, have you seen Microsoft's 1b model "phi-1_5"? If so, same operations of llama 2 work on it? I tried and it's not working. Could you check it out?

gowthamyarlagadda
Автор

Im facing this error- OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 832.00 KiB is free. Process 2009320 has 14.73 GiB memory in use. Of the allocated memory 13.60 GiB is allocated by PyTorch, and 115.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

when doing this step- Merge Trained LoRA Adapter With BASE MODEL and Push Model to Hub
Please help

nosxr