Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

Показать описание

The first problem you’re likely to encounter when fine-tuning an LLM is the “host out of memory” error. It’s more difficult for fine-tuning the 7B parameter Llama-2 model which requires more memory. In this talk, we are having Piero Molino and Travis Addair from the open-source Ludwig project to show you how to tackle this problem.

In this hands-on workshop, we‘ll discuss the unique challenges in finetuning LLMs and show you how you can tackle these challenges with open-source tools through a demo.

By the end of this session, attendees will understand:
- How to fine-tune LLMs like Llama-2-7b on a single GPU
- Techniques like parameter efficient tuning and quantization, and how they can help
- How to train a 7b param model on a single T4 GPU (QLoRA)
- How to deploy tuned models like Llama-2 to production
- Continued training with RLHF
- How to use RAG to do question answering with trained LLMs
This session will equip ML engineers to unlock the capabilities of LLMs like Llama-2 on for their own projects.

This event is inspired by DeepLearning.AI’s GenAI short courses, created in collaboration with AI companies across the globe. Our courses help you learn new skills, tools, and concepts efficiently within 1 hour.

Here is the link to the notebook used in the workshop:

Speakers:

Piero Molino, Co-founder and CEO of Predibase

Travis Addair, Co-founder and CTO of Predibase

DeepLearningAI

Рекомендации по теме

Комментарии

Very helpful! Already trained llama-2 with custom classifications using the cookbook. Thanks!

thelinuxkid

Very informative. Direct and to the point content in a easily understandable presentation.

dinupavithran

Well this was simply excellent, thank you 🙏🏻

thedelicatecook

One of the most complete videos. Must watch

andres.yodars

Excellent xtal clear surgery on GPU VRAM utilization...

ab

🖖alignement by sectoring hyperparameters in behaviour, nice one

KarimMarbouh

Eh, c'était super. Merci beaucoup!

rgeromegnace

I like to kindly request @DeepLearningAI to prepare such hands-on workshop on fine-tunning Source Code Models.

ggm

Cool video. If I want to fine-tune it on a single specific tassk (keyword extraction), should I first train an instruction-tuned model, and then train that on my specific task? Or mix the datasets together?

pickaxe-support

Hello everyone, I would be so happy if the recorded video have caption/subtitles.

ggm

Nvidia H100 GPU on Lambda labs is just $2/hr, I am using it for past few months unlike $12.29/hr on AWS as shown in the slide.
I get it, it's still not cheap but just worth mentioning here

zubairdotnet

And I was under the delusion that I would be able to fine-tune the 70B param model on my 4090. Oh well...

TheGargalon

Please can you provide a link to the slides?

nekrot

What's the music in the beginning, can't shake it off

ayushyadav-bmto

I ran Colab T4 and still got into “RuntimeError: CUDA Out of memory”. Any thing else I can do please?

nminhptnk

@pieromolino_pb -Is Ludwig allows to locally download and deploy the fine-tuned model?

stalinamirtharaj

How long did the entire training process take?

feysalmustak

at 51:30 he says don't repeat the same prompt in the training data. What if I am fine-tuning the model on a single task but with thousands of different inputs for the same prompt?

PickaxeAI

epochs=3, since we are fine tunning, would epochs=1 would suffice?

kevinehsani

This seems to make a case for Apple Silicon for training. The M3 Max performs close to an RTX 3080, but with access to up to 192GB of memory.

Neberheim

Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

The EASIEST way to finetune LLAMA-v2 on local machine!

LLAMA-2 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Efficient Fine-Tuning for Llama 2 on Custom Dataset with QLoRA on a Single GPU in Google Colab

Beyond KV Caching: Shared Attention for Efficient LLMs

Fine-tune LLama2 w/ PEFT, LoRA, 4bit, TRL, SFT code #llama2

Fine-tuning LLMs with PEFT and LoRA

How to Create Custom Datasets To Train Llama-2

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Finetuning Open-Source LLMs

LLAMA-2 Open-Source LLM: Custom Fine-tuning Made Easy on a Single-GPU Colab Instance | PEFT | LORA

fine tuning llama-2 to code

Fine-Tune Llama2 | Step by Step Guide to Customizing Your Own LLM

LLM Fine Tuning Crash Course: 1 Hour End-to-End Guide

Fine-tune LLama 2 in 2 Minutes on your Data - Code Example

🐐Llama 2 Fine-Tune with QLoRA [Free Colab 👇🏽]

Fine tuning Llama2 using AutoTrain HuggingFace || No code tool for fine-tuning models

Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)

'okay, but I want Llama 3 for my specific use case' - Here's how

Lessons From Fine-Tuning Llama-2