Full Fine tuning with Fewer GPUs - Galore, Optimizer Tricks, Adafactor

Показать описание

VIDEO RESOURCES:

TIMESTAMPS:
0:00 LLM Full fine-tuning with lower VRAM
0:37 Video Overview
4:02 Understanding Optimisers
6:17 Stochastic Gradient Descent (SGD)
7:53 AdamW Optimizer and VRAM requirements
9:31 AdamW 8-bit optimizer
11:03 Adafactor optimiser and memory requirements
14:28 GaLore - reducing gradient and optimizer VRAM
19:10 LoRA versus GaLoRe
19:49 Better and Faster GaLoRe via Subspace Descent
22:59 Layerwise gradient updates
26:17 Training Scripts
27:10 How gradient checkpointing works to reduce memory
40:30 AdamW Performance
41:14 AdamW 8bit Performance
42:45 Adafactor with manual learning rate and schedule
44:10 Adafactor with default/auto learning rate
45:47 Galore AdamW
50:22 Galore AdamW with Subspace descent
52:25 Using AdamW8bit and Adafactor with GaLoRe
53:14 Notebook demo of layerwise gradient updates
55:28 Running with LoRa
58:36 Inferencing and Pushing Models to Hub
1:00:00 Single GPU Recommendations
1:25:00 Multi-GPU Recommendations
1:03:25 Resources

Рекомендации по теме

Комментарии

Very up to date! Includes GaLore, etc.

andpoul

Can you implement a few papers in pytorch like gradtts and more

imranullah

Hey Trelis! Can you help me setup a **Multi-node, multi-gpu** training infra using RunPod. I figured this out using the community cloud option where I can set a Public IP for my pods and expose the TCP ports with the same internal and external port numbers. However, I'm not able to add a shared disk across my community pods to save checkpoints in case of node failure. I totally failed to set communication between two different pods when I launched them in the secure cloud. But secure cloud allows network volume that can be shared across different pods.

Can you help me set-up infra for multi-node multi-gpu set up in secure cloud. In paperspace this was easy, but I am not able to figure this out using RunPod. Any suggestions are welcome

padmasrivaddiparthi

Can we convert full fine tuned model to lora (svd on delta weights)

VijayEranti

Hi. Will this work for continued pretraining on text books for domain specific adaptive learning? All i see on the internet are LoRA videos. I have seen your video on FFT and thats what i want for my use case.

mdrafatsiddiqui

Full Fine tuning with Fewer GPUs - Galore, Optimizer Tricks, Adafactor

RAG vs. Fine Tuning

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning LLMs with PEFT and LoRA

Fine Tuning ChatGPT is a Waste of Your Time

Fine Tuning LLM Models – Generative AI Course

When Do You Use Fine-Tuning Vs. Retrieval Augmented Generation (RAG)? (Guest: Harpreet Sahota)

What is Prompt Tuning?

Fine-tuning Large Language Models

Build Enterprise AI Phone Agents | Conversational Flows Retell AI FULL Guide

QLoRA - Efficient Finetuning of Quantized LLMs

Building Better Large Language Models - Key Concepts for Prompting and Fine Tuning

Fine-Tuning Llama 3 on a Custom Dataset: Training LLM for a RAG Q&A Use Case on a Single GPU

LLM Module 4: Fine-tuning and Evaluating LLMs | 4.4 Fine Tuning: Few-shot Learning

LoRA & QLoRA Fine-tuning Explained In-Depth

Fine-tuning EXPLAINED in 40 sec #generativeai

Long prompts vs fine tuning LLMs

Fine Tuning Qwen 2 with Custom Data

19 Tips to Better AI Fine Tuning

Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

FLUX Full Fine-Tuning / DreamBooth Training Master Tutorial for Windows, RunPod & Massed Compute

Fine-tuning with TensorFlow

LLM (Parameter Efficient) Fine Tuning - Explained!

What is Finetuning LLMs? #llmwithav #learnwithav #llm #datascience #generativeai #finetuning

Fine-Tuning LLMs for RAG: Boost Model Performance and Accuracy