Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

preview_player
Показать описание
In this tutorial, we will walk through a step by step tutorial on how to fine tune Mixtral MoE from Mistral AI on your own dataset.

LINKS:
@AI-Makerspace

Want to Follow:

Want to Support:

Need Help?

Join this channel to get access to perks:

Timestamps:
[00:00] Introduction
[00:57] Prerequisites and Tools
[01:52] Understanding the Dataset
[03:35] Data Formatting and Preparation
[06:16] Loading the Base Model
[09:55] Setting Up the Training Configuration
[13:22] Fine-Tuning the Model
[16:28] Evaluating the Model Performance

All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

3:37 format

4:15 follow a different format

4:26
Indicate the end of user input


4:33
special token Indicate the end of model response


4:39
you need to provide your data in this format

5:08
def create_prompt

5:31
System message

6:16
Load our based model

薇季芬
Автор

🎯 Key Takeaways for quick navigation:

00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model*
- Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset.
- Mention of the popularity and potential of Mixtral 87B as a mixture of experts model.
- Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details.
01:28 🛠️ *Installing Required Packages and Data Set Overview*
- Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes.
- Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning.
- Overview of the dataset structure, splits, and sources.
03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B*
- Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version.
- Discussion on rearranging data to make it more challenging by creating instructions from provided text.
- Demonstration of a function to reformat the initial data into the desired prompt template.
06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning*
- Acknowledgment of the source for the notebook and clarification that the base version is used.
- Setting configurations, loading the model, and tokenizer, along with using Flash attention.
- Explanation of the importance of setting up configurations for a smooth fine-tuning process.
08:18 🔄 *Checking Base Model Responses Before Fine-Tuning*
- Use of a function to check responses from the base model before any fine-tuning.
- Illustration of the base model behavior in generating responses to a given prompt.
- Recognition that the base model tends to follow next word prediction rather than explicit instructions.
10:06 📏 *Determining Max Sequence Length for Fine-Tuning*
- Explanation of the importance of max sequence length in fine-tuning Mixtral 87B.
- Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset.
- Emphasis on selecting a max sequence length that covers the majority of examples.
12:20 🧠 *Adding Adapters with Lura for Fine-Tuning*
- Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters.
- Introduction to Lura configuration for attaching adapters to specific layers.
- Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning.
14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning*
- Verification of multiple GPUs for parallelization during model training.
- Definition of output directory and selection of training epochs or steps.
- Importance of configuring the trainer, including considerations for max sequence length.
16:50 📈 *Analyzing Fine-Tuning Results and Storing Model*
- Presentation of training and validation loss graphs, indicating a gradual decrease.
- Acknowledgment of the need for potential longer training for better model performance.
- Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository.
17:46 🔄 *Testing Fine-Tuned Model Responses*
- Utilization of the fine-tuned model to generate responses to a given prompt.
- Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions.
- Acknowledgment that further training could enhance the model's performance.

Made with HARPA AI

jprobichaud
Автор

Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!

AI-Makerspace
Автор

IM sceptical this actually is effectively training mixtral MoE model and not making it worse!

lukeskywalker
Автор

why are you using packing in the SFTTrainer if you just said that you're going to pad the examples?

Tiberiu
Автор

kudos, really simple and direct explaination.

dev_navdeep
Автор

can you also make a video on fine-tuning multimodal models like llava, cog-vlm

varunnegi-vz
Автор

Great video! Could you please consider training and deploying it in Sagemaker?

divyagarh
Автор

Thank you for sharing, this is very helpful! Looking forward to the next videos!

MikewasG
Автор

Great video just have one question can we use the fine-tuned model as a pickle file?

HarmeetSingh-ryfm
Автор

Thanks for the guide!
How to continue fine-tuning process such as in this case?
Can you load previous work (Lora) and carry on, or do you need to restart?

alexxx
Автор

That's a great video. Thanks for sharing.
After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.

Ai-Marshal
Автор

Could you make a tutorial teaching how to convert a model to ggml format?

joaops
Автор

I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?

Akshatgiri
Автор

Can you make this for home computer use in terms of my personal data and tech it to use tools on your system and online

kaio
Автор

Thanks for the video 😃 i just have a question, is it possible to use the model through an API and also provide the source files for the data with the response ?

ahmedmechergui
Автор

at 5:58, Why is the sample["response"] given as the input and sample["prompt"] is given as response

IshfaqAhmed-pd
Автор

how do you format a prompt that has multiple requests and responses within the same

shinygoomy
Автор

Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.

VerdonTrigance
Автор

Thanks, what is the cost to do this? Server cost?

researchforumonline