Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

preview_player
Показать описание
The first problem you’re likely to encounter when fine-tuning an LLM is the “host out of memory” error. It’s more difficult for fine-tuning the 7B parameter Llama-2 model which requires more memory. In this talk, we are having Piero Molino and Travis Addair from the open-source Ludwig project to show you how to tackle this problem.

In this hands-on workshop, we‘ll discuss the unique challenges in finetuning LLMs and show you how you can tackle these challenges with open-source tools through a demo.

By the end of this session, attendees will understand:
- How to fine-tune LLMs like Llama-2-7b on a single GPU
- Techniques like parameter efficient tuning and quantization, and how they can help
- How to train a 7b param model on a single T4 GPU (QLoRA)
- How to deploy tuned models like Llama-2 to production
- Continued training with RLHF
- How to use RAG to do question answering with trained LLMs
This session will equip ML engineers to unlock the capabilities of LLMs like Llama-2 on for their own projects.

This event is inspired by DeepLearning.AI’s GenAI short courses, created in collaboration with AI companies across the globe. Our courses help you learn new skills, tools, and concepts efficiently within 1 hour.

Here is the link to the notebook used in the workshop:

Speakers:

Piero Molino, Co-founder and CEO of Predibase


Travis Addair, Co-founder and CTO of Predibase

Рекомендации по теме
Комментарии
Автор

Very helpful! Already trained llama-2 with custom classifications using the cookbook. Thanks!

thelinuxkid
Автор

Very informative. Direct and to the point content in a easily understandable presentation.

dinupavithran
Автор

Well this was simply excellent, thank you 🙏🏻

thedelicatecook
Автор

One of the most complete videos. Must watch

andres.yodars
Автор

Excellent xtal clear surgery on GPU VRAM utilization...

ab
Автор

🖖alignement by sectoring hyperparameters in behaviour, nice one

KarimMarbouh
Автор

Eh, c'était super. Merci beaucoup!

rgeromegnace
Автор

I like to kindly request @DeepLearningAI to prepare such hands-on workshop on fine-tunning Source Code Models.

ggm
Автор

Cool video. If I want to fine-tune it on a single specific tassk (keyword extraction), should I first train an instruction-tuned model, and then train that on my specific task? Or mix the datasets together?

pickaxe-support
Автор

Hello everyone, I would be so happy if the recorded video have caption/subtitles.

ggm
Автор

Nvidia H100 GPU on Lambda labs is just $2/hr, I am using it for past few months unlike $12.29/hr on AWS as shown in the slide.
I get it, it's still not cheap but just worth mentioning here

zubairdotnet
Автор

And I was under the delusion that I would be able to fine-tune the 70B param model on my 4090. Oh well...

TheGargalon
Автор

Please can you provide a link to the slides?

nekrot
Автор

What's the music in the beginning, can't shake it off

ayushyadav-bmto
Автор

I ran Colab T4 and still got into “RuntimeError: CUDA Out of memory”. Any thing else I can do please?

nminhptnk
Автор

@pieromolino_pb -Is Ludwig allows to locally download and deploy the fine-tuned model?

stalinamirtharaj
Автор

How long did the entire training process take?

feysalmustak
Автор

at 51:30 he says don't repeat the same prompt in the training data. What if I am fine-tuning the model on a single task but with thousands of different inputs for the same prompt?

PickaxeAI
Автор

epochs=3, since we are fine tunning, would epochs=1 would suffice?

kevinehsani
Автор

This seems to make a case for Apple Silicon for training. The M3 Max performs close to an RTX 3080, but with access to up to 192GB of memory.

Neberheim