Prepare Fine-tuning Datasets with Open Source LLMs

Показать описание

Chapters:
0:00 Preparing data for fine-tuning
0:37 Video overview
1:04 Accessing the GitHub Repo w/ data preparation scripts
2:42 Q&A Dataset preparation using Llama 2 70B and chat-ui
7:29 How to set up a Llama 2 API for 70B
8:45 Using a Llama 2 API to prepare a Q&A dataset for fine-tuning
12:22 Pro tips for preparing fine-tuning datasets

Рекомендации по теме

Комментарии

I purchased full access to your repo because I love and want to support the work you are doing. Some of the clearest and most articulate explanations about embedding, fine-tuning. Supervised vs unsupervised methods, data prep. Keep it up!

nkhuang

Great video! How are you chunking the videos, by paragraph, sentence, word char, etc? Are you using any overlap in the chunks? Have you tested you system with a smaller llama 2 model? What type of results would one get from maybe a llama 2 13B, or even a 7B that could possibly be ran from home?

unshadowlabs

Hi Ronan. Where is the code relevant to this video as of june 2024? In the Adv. FT repo, there is no trace of it AFAIK. Thanks.

TheLokiGT

Hi, I just paid for the access to the repo of this video, but I wasn't aware of the option to buy access to all projects in the repo, Is there any way to pay the difference and upgrade? how can I get in touch with you for that? love the work btw!

MarxOrx

Hi thanks!! A question for a model in which I have more than 2, 000 pdfs. Do you recommend improving the handling of vector databases? When do you recommend fine tunning and when do you recommend vector database

devtest

is "Context" a keyword which this specific model knows? how would it notice it after the blob of text

babyfox

you used plain text for the dataset, is it better than the json format? when choosing one or the other? thanks for the video!

izmhcdq

On Runpod, How do I get/amend Llama 70B API by TrelisResearch Template to work with an exposed TCP?
The terminal says connection is refused in the terminal and in VScode (preferred).
Other templates work fine.
Doesn't work: The SSH over exposed TCP: (Supports SCP & SFTP)
Works: the Basic SSH Terminal: (No support for SCP & SFTP) works fine.
The basic SSH terminal is not going to work with VScode to my knowledge.
Perhaps there is a way to edit the templates for these containers so they can work with VS code?
I'm really looking forward to digging into your tutorials :)

GrahamAndersonis

I want to fine-tune on my code. I have multiple folders and files in each project on which i want to fine-tune. Can this private repo work in that? Basically i want to fine-tune on my coding projects.

HemangJoshi

Prepare Fine-tuning Datasets with Open Source LLMs

Prepare Fine-tuning Datasets with Open Source LLMs

How to Create Custom Datasets To Train Llama-2

How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

Fine-tuning Datasets with Synthetic Inputs

How to Make a Fine-tune Model (New Free Tool!)

Convert Any Text to LLM Dataset Locally - Demo with Example

How to Fine-Tune and Train LLMs With Your Own Data EASILY and FAST- GPT-LLM-Trainer

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)

BEST Datasets for LLMs | Plus: Create Your Own

'okay, but I want Llama 3 for my specific use case' - Here's how

Q: How to create an Instruction Dataset for Fine-tuning my LLM?

Fine-tuning ChatGPT with OpenAI Tutorial - [Customize a model for your application in 12 Minutes]

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

Finetuning Open-Source LLMs

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

LLAMA-3.1 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Fine-Tune Llama 3 Model on Custom Dataset - Step-by-step Tutorial

Fine-Tune ChatGPT For Your Exact Use Case

How to Fine-Tune and Train LLMs With Your Own Data EASILY and FAST With AutoTrain

How to make a custom dataset like Alpaca7B

Fine-Tuning GPT-3.5 on Custom Dataset: A Step-by-Step Guide | Code

Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

Fine-tuning Large Language Models (LLMs) | w/ Example Code