filmov
tv
Fine tuning Pixtral - Multi-modal Vision and Text Model
Показать описание
VIDEO RESOURCES:
TIMESTAMPS:
0:00 How to fine-tune Pixtral.
0:43 Video Overview
1:27 Pixtral architecture and design choices
3:51 Mistral’s custom image encoder - trained from scratch
8:35 Fine-tuning Pixtral in a Jupyter notebook
9:33 GPU setup for notebook fine-tuning and VRAM requirements
12:23 Getting a “transformers” version of Pixtral for fine-tuning
15:00 Loading Pixtral
16:21 Dataset loading and preparation
18:08 Chat templating (somewhat advanced, but recommended)
23:33 Inspecting and evaluating baseline performance on the custom data
26:34 Setting up data collation (including for multi-turn training).
31:09 Training on completions only (tricky but improves performance)
35:08 Setting up LoRA fine-tuning
41:04 Setting up training arguments (batch size, learning rate, gradient checkpointing)
43:36 Setting up tensor board
46:48 Evaluating the trained model
47:46 Merging LoRA adapters and pushing the model to hub
49:07 Measuring performance on OCR (optical character recognition)
50:28 Inferencing Pixtral with vLLM, setting up an API endpoint
55:17 Video resources
Комментарии