Fine-Tuning Llama 2 70B on Consumer Hardware(QLora): A Step-by-Step Guide

preview_player
Показать описание
In this video, I take you through a detailed tutorial on the recent update to the FineTune LLMs repo. This tutorial covers the process of fine-tuning Llama 70B on consumer-grade hardware. Specifically, I highlight the vital role of recent innovations like QLora and FlashAttention 2 in enabling such fine-tuning.

The tutorial also addresses the challenge of using the pad token ID in fine-tuning LLM models, and I present a neat trick using rare, unused tokens.

Finally, I demonstrate some runs using the model I trained and to answer some prompts, showing successful fine-tuning.

Access the complete video for insights into how I fine-tune LLMs and be sure to check out my other videos on the same. Remember to subscribe, share, and click the notification bell to stay updated!

#FineTuneLLMs #LLAMA70B #FineTuning #SoftwareTutorial #CodeTutorial #ProgrammingTutorial #Python #QLORA #FlashAttention2 #MachineLearning #DataScience #ComputerScience #AI #LanguageModel #NLP.

Timestamps:

00:00 - Intro
00:56 - Summary Of Qlora and Flash Attention
02:02 - Setting Up Software
05:14 - Getting A Dataset
05:47 - Examining The Software
12:37 - Running The Software
13:58 - Software Performance Analysis
15:13 - Training Results And Shared Model
16:16 - Running Instructions On Model
17:30 - Custom Datasets And Models
17:56 - Outro
Рекомендации по теме
Комментарии
Автор

Thanks for an interesting concept! Did you try to improve math reasoning for 70B llama2? I have a small cluster with 8 GPUs NVIDIA A100 40G and try to find a dataset to improve the base model

iforels
Автор

I see that after finetuning the model i get a .json and .bin adapter file, how would I run my model using these? Can I use them with llama cpp or how do I go about using the finetuning. I guess my goal is a chat that uses my finetuned model.

itzslyr
Автор

you need 2 x 3090 or 48GB VRAM to finetune 70B model ? So for 13B model, I should be able to do the same with 1 3090 card? I hope to have more hardware requirement details so that I can determine with this procedure is useful to me. Thanks!

woongda
Автор

2-3090's and nvlink seem like the lowest entry to llama 2 70b. 2 used 3090's are about the price of a single 4090. Still too expensive for my wallet but at least something I can dream about.

robertfontaine
Автор

Hi, I was trying to replicate the LLaMA 2 70B fine-tuning with 2 4090s, even with the --split_model flag, the model is loaded to one GPU only before OOM. I tried with the 7B model, which is loaded to one GPU then replicate to the other. It seems it's running in data parallel not model parallel. Is nvlink required for it to work correctly?

chris_zhp
Автор

What is the training time of this model in 3090?

TV-chql
Автор

cool! how about 8 v100 16g. 128g vram in total.

yongtao