Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

Показать описание

In this tutorial, we will walk through a step by step tutorial on how to fine tune Mixtral MoE from Mistral AI on your own dataset.

LINKS:
@AI-Makerspace

Want to Follow:

Want to Support:

Need Help?

Join this channel to get access to perks:

Timestamps:
[00:00] Introduction
[00:57] Prerequisites and Tools
[01:52] Understanding the Dataset
[03:35] Data Formatting and Preparation
[06:16] Loading the Base Model
[09:55] Setting Up the Training Configuration
[13:22] Fine-Tuning the Model
[16:28] Evaluating the Model Performance

All Interesting Videos:

Рекомендации по теме

Комментарии

3:37 format

4:15 follow a different format

4:26
Indicate the end of user input

4:33
special token Indicate the end of model response

4:39
you need to provide your data in this format

5:08
def create_prompt

5:31
System message

6:16
Load our based model

薇季芬

🎯 Key Takeaways for quick navigation:

00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model*
- Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset.
- Mention of the popularity and potential of Mixtral 87B as a mixture of experts model.
- Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details.
01:28 🛠️ *Installing Required Packages and Data Set Overview*
- Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes.
- Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning.
- Overview of the dataset structure, splits, and sources.
03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B*
- Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version.
- Discussion on rearranging data to make it more challenging by creating instructions from provided text.
- Demonstration of a function to reformat the initial data into the desired prompt template.
06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning*
- Acknowledgment of the source for the notebook and clarification that the base version is used.
- Setting configurations, loading the model, and tokenizer, along with using Flash attention.
- Explanation of the importance of setting up configurations for a smooth fine-tuning process.
08:18 🔄 *Checking Base Model Responses Before Fine-Tuning*
- Use of a function to check responses from the base model before any fine-tuning.
- Illustration of the base model behavior in generating responses to a given prompt.
- Recognition that the base model tends to follow next word prediction rather than explicit instructions.
10:06 📏 *Determining Max Sequence Length for Fine-Tuning*
- Explanation of the importance of max sequence length in fine-tuning Mixtral 87B.
- Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset.
- Emphasis on selecting a max sequence length that covers the majority of examples.
12:20 🧠 *Adding Adapters with Lura for Fine-Tuning*
- Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters.
- Introduction to Lura configuration for attaching adapters to specific layers.
- Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning.
14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning*
- Verification of multiple GPUs for parallelization during model training.
- Definition of output directory and selection of training epochs or steps.
- Importance of configuring the trainer, including considerations for max sequence length.
16:50 📈 *Analyzing Fine-Tuning Results and Storing Model*
- Presentation of training and validation loss graphs, indicating a gradual decrease.
- Acknowledgment of the need for potential longer training for better model performance.
- Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository.
17:46 🔄 *Testing Fine-Tuned Model Responses*
- Utilization of the fine-tuned model to generate responses to a given prompt.
- Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions.
- Acknowledgment that further training could enhance the model's performance.

Made with HARPA AI

jprobichaud

Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!

AI-Makerspace

IM sceptical this actually is effectively training mixtral MoE model and not making it worse!

lukeskywalker

why are you using packing in the SFTTrainer if you just said that you're going to pad the examples?

Tiberiu

kudos, really simple and direct explaination.

dev_navdeep

can you also make a video on fine-tuning multimodal models like llava, cog-vlm

varunnegi-vz

Great video! Could you please consider training and deploying it in Sagemaker?

divyagarh

Thank you for sharing, this is very helpful! Looking forward to the next videos!

MikewasG

Great video just have one question can we use the fine-tuned model as a pickle file?

HarmeetSingh-ryfm

Thanks for the guide!
How to continue fine-tuning process such as in this case?
Can you load previous work (Lora) and carry on, or do you need to restart?

alexxx

That's a great video. Thanks for sharing.
After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.

Ai-Marshal

Could you make a tutorial teaching how to convert a model to ggml format?

joaops

I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?

Akshatgiri

Can you make this for home computer use in terms of my personal data and tech it to use tools on your system and online

kaio

Thanks for the video 😃 i just have a question, is it possible to use the model through an API and also provide the source files for the data with the response ?

ahmedmechergui

at 5:58, Why is the sample["response"] given as the input and sample["prompt"] is given as response

IshfaqAhmed-pd

how do you format a prompt that has multiple requests and responses within the same

shinygoomy

Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.

VerdonTrigance

Thanks, what is the cost to do this? Server cost?

researchforumonline

Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

How to Fine-tune Mixtral 8x7B MoE on Your Own Dataset

Fine-Tune Mixtral 8x7B (Mistral's Mixture of Experts MoE) Model - Walkthrough Guide

How To Finetune Mixtral-8x7B On Consumer Hardware

Mixtral - Mixture of Experts (MoE) from Mistral

Mixtral 8x7B DESTROYS Other Models (MoE = AGI?)

This new AI is powerful and uncensored… Let’s run it

Fine-Tuning Mistral AI 7B for FREEE!!! (Hint: AutoTrain)

Mistral 8x7B Part 1- So What is a Mixture of Experts Model?

NEW Mixtral 8x22b Tested - Mistral's New Flagship MoE Open-Source Model

Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs 🤯🤯

How to Fine-Tune Mistral 7B on Your Own Data

Mixtral of Experts (Paper Explained)

MIXTRAL 8x7B MoE Instruct: LIVE Performance Test

The NEW Mixtral 8X7B Paper is GENIUS!!!

NousRedditGPT-8x7B: Legendary Mistral MoE Trained on 10k Reddit threads (on Apple MLX!)

Mixtral Fine tuning and Inference

Mixtral 8x7B: New Mistral Model IS INSANE! 8x BETTER Than Before - Beats GPT-4/Llama 2

New AI MIXTRAL 8x7B Beats Llama 2 and GPT 3.5

Stanford CS25: V4 I Demystifying Mixtral of Experts

How To Run Mistral 8x7B LLM AI RIGHT NOW! (nVidia and Apple M1)

Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & De...

Mistral MEDIUM vs Mixtral 8x7B: 4x more powerful?

Dolphin 2.5 🐬 Fully UNLEASHED Mixtral 8x7B - How To and Installation