Fine-Tuning Llama 2 70B on Consumer Hardware(QLora): A Step-by-Step Guide

Показать описание

In this video, I take you through a detailed tutorial on the recent update to the FineTune LLMs repo. This tutorial covers the process of fine-tuning Llama 70B on consumer-grade hardware. Specifically, I highlight the vital role of recent innovations like QLora and FlashAttention 2 in enabling such fine-tuning.

The tutorial also addresses the challenge of using the pad token ID in fine-tuning LLM models, and I present a neat trick using rare, unused tokens.

Finally, I demonstrate some runs using the model I trained and to answer some prompts, showing successful fine-tuning.

Access the complete video for insights into how I fine-tune LLMs and be sure to check out my other videos on the same. Remember to subscribe, share, and click the notification bell to stay updated!

#FineTuneLLMs #LLAMA70B #FineTuning #SoftwareTutorial #CodeTutorial #ProgrammingTutorial #Python #QLORA #FlashAttention2 #MachineLearning #DataScience #ComputerScience #AI #LanguageModel #NLP.

Timestamps:

00:00 - Intro
00:56 - Summary Of Qlora and Flash Attention
02:02 - Setting Up Software
05:14 - Getting A Dataset
05:47 - Examining The Software
12:37 - Running The Software
13:58 - Software Performance Analysis
15:13 - Training Results And Shared Model
16:16 - Running Instructions On Model
17:30 - Custom Datasets And Models
17:56 - Outro

Рекомендации по теме

Комментарии

Thanks for an interesting concept! Did you try to improve math reasoning for 70B llama2? I have a small cluster with 8 GPUs NVIDIA A100 40G and try to find a dataset to improve the base model

iforels

I see that after finetuning the model i get a .json and .bin adapter file, how would I run my model using these? Can I use them with llama cpp or how do I go about using the finetuning. I guess my goal is a chat that uses my finetuned model.

itzslyr

you need 2 x 3090 or 48GB VRAM to finetune 70B model ? So for 13B model, I should be able to do the same with 1 3090 card? I hope to have more hardware requirement details so that I can determine with this procedure is useful to me. Thanks!

woongda

2-3090's and nvlink seem like the lowest entry to llama 2 70b. 2 used 3090's are about the price of a single 4090. Still too expensive for my wallet but at least something I can dream about.

robertfontaine

Hi, I was trying to replicate the LLaMA 2 70B fine-tuning with 2 4090s, even with the --split_model flag, the model is loaded to one GPU only before OOM. I tried with the 7B model, which is loaded to one GPU then replicate to the other. It seems it's running in data parallel not model parallel. Is nvlink required for it to work correctly?

chris_zhp

What is the training time of this model in 3090?

TV-chql

cool! how about 8 v100 16g. 128g vram in total.

yongtao

Fine-Tuning Llama 2 70B on Consumer Hardware(QLora): A Step-by-Step Guide

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Fine-Tuning Llama 2 70B on Consumer Hardware(QLora): A Step-by-Step Guide

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

Fine-tune LLama 2 in 2 Minutes on your Data - Code Example

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

'okay, but I want Llama 3 for my specific use case' - Here's how

How to Run LLaMA-2-70B on the Together AI

LLAMA-2 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

The EASIEST way to finetune LLAMA-v2 on local machine!

New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks

Fine-Tuning 70B Llama2 on Intel® Gaudi® 2 | Intel Software

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

DALL-E 3, Amazon Alexa with LLMs & Llama-2 70B on your local GPU : Epic AI News

M3 max 128GB for AI running Llama2 7b 13b and 70b

Mistral, LLaMa & Co. - Kostenlose KI-Giganten lokal nutzen

How to efficiently outperform GPT-3.5 and Llama 2-70B

Llama 2-70B cost effectiveness is unmatched.....

EASILY Train Llama 3.1 and Upload to Ollama.com

How to Run LLaMA 70B on Your LOCAL PC with Petals

fine tuning llama-2 to code

All You Need To Know About Running LLMs Locally

Project Management using RLHF w/ Llama 2 70B as PM Assistant & GPT 4 as PM Chatbot on FolioProje...

Why you can't use Llama 2 (70B), Yet

Llama 2 - Meta's answer to ChatGPT