How to Create Custom Datasets To Train Llama-2

preview_player
Показать описание
In this video, I will show you how to create a dataset for fine-tuning Llama-2 using the code interpreter within GPT-4. We will create a dataset for creating a prompt given a concept. We will structure the dataset in proper format to fine tune a Llama-2 7B model using the HuggingFace auto train-advanced package.

Happy learning :)

#llama2 #finetune #llm

▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
LINKS:
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Timestamps:

Intro: [00:00]
Testing Vanila Llama2: [01:20]
Description of Dataset: [02:14]
Code Interpreter: [03:24]
Structure of the Dataset: [4:56]
Using Base model: [06:18]
Fine-tuning Llama2: [07:25]
Logging during training: [10:36]
Inference of the fine-tuned model: [12:44]
Output Examples: [14:36]
Things to Consider: [15:40]
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

Thanks, this gives me exactly what I needed to understand how to create a dataset for fine tuning. Most of the other videos skip over the details of the formatting and other parameters that go into creating your own dataset. Thanks again!

chuckwashington
Автор

FYI you're the man. idk why it was so hard to find a good pipeline to train literally went througfh all the libs and no one mentioned autotrainer advanced lol

oliversilverstein
Автор

Thank you so much! This just gives me a really good basis on how I can start finetuning my own model! Because the model will in the end be as good as the training set.

pareak
Автор

Datasets are key for fine tuning. This is a great video!

SafetyLabsInc_ca
Автор

You're an AI champion. Thanks for the fine-tuning lectures 🙏🙏🙏

samcavalera
Автор

@Prompt Engineering. Wow, exactly what I was looking for . I have another request, Can you please make a video on Prompt-Tuning/P-Tuning which is also a PEFT technique ?

abhijitbarman
Автор

Thanks. very nice way you explained the concept. it gives boost to the knowledge and to the area where usually people have fear in mind to grasp but the way you explained it, to me it looks very easy. today i got the ability to fine tune the model myself. thanks a lot Sir. looking forward to more advanced topics from you.

umeshtiwari
Автор

Can I finetune llama 2 for pdf to question answers generation?

Phoenix-fric
Автор

How could I limit it, for example I train it with several relevant paragraphs about the little prince novel, how do I limit it so that it only answers questions that are in the context of the little prince novel

AGAsnow
Автор

when I try to run the command in the terminal it gives error: autotrain <command> [<args>] llm: error: the following arguments are required: --project-name

brunapupoo
Автор

I have a question. Why don't we use the conversation format given by llama2, which contains <s><INST>, something like that? thanks

xiangyao
Автор

How does this differ if I'm looking to fine-tune for Llama2 7b code instruct

ishaanshettigar
Автор

I need help please. I just want to be pointed in the right direction since I'm new to this and since I couldn't really find any proper guide to summarize the steps for what I want to accomplish.

I want to integrate a LLama 2 70B chatbot into my website. I have no idea where to start. I looked into setting up the environment on one of my cloud servers(Has to be private). Now I'm looking into training/fine-tuneing the chat model using our data from our DBs(It's not clear for me here but I assume it involves two steps, first I have to have the data in a CSV format since it's easier for me, second I will need to format it in Alpaca or Openassistant formats). After that, the result should be a deployment-ready model ?

Just bullet points I'd highly appreciate that.

vitocorleon
Автор

Kudos on the excellent video! Your hard work is acknowledged. Could we expect a video about DemoGPT from you?

DemoGPT
Автор

Thank you very much for the video. In the case of plaintext, how the dataset could be formatted?

haouarino
Автор

Thank you man, that is exactly what i am looking for

derejehinsermu
Автор

Thanks for the video.
Two things please:
1. When you use autotrain package, then all details are hidden and one is not able to see what is being done and in what exact steps. I would suggest a video like that please if you have even same example.
2. Secondly, it is not clear to me what is the data vs label being fed into the model training phase, what is the loss function, how it is being calculated, etc...

muhannadobeidat
Автор

Thanks for the informative video. I am wondering: Is there a way to do this, but with local LLMs?

stickmanland
Автор

Very coherent and well explained. Thank you kindly. I'm curious also if you have any advice about creating a dataset that would allow me to fine tune my model on my database schema? What I'd like to do is run my model locally, and ask it to interact with my database, and have it do so in a smooth and natural manner. I'm curious about how one would structure a database schema as a dataset for fine tuning. Any recommendations or advice would be greatly appreciated. Thanks again! Great videos!

vbywrde
Автор

If you don’t mind sharing, what’s the performance of a Mac like when fine tuning? I’m quite keen to see how long it takes to fine tune a 7B vs a 13B parameter model on a consumer machine on a small/medium sized dataset. Thanks for the tutorial, very helpful!

lrkx_