Prepare Fine-tuning Datasets with Open Source LLMs

preview_player
Показать описание

Chapters:
0:00 Preparing data for fine-tuning
0:37 Video overview
1:04 Accessing the GitHub Repo w/ data preparation scripts
2:42 Q&A Dataset preparation using Llama 2 70B and chat-ui
7:29 How to set up a Llama 2 API for 70B
8:45 Using a Llama 2 API to prepare a Q&A dataset for fine-tuning
12:22 Pro tips for preparing fine-tuning datasets
Рекомендации по теме
Комментарии
Автор

I purchased full access to your repo because I love and want to support the work you are doing. Some of the clearest and most articulate explanations about embedding, fine-tuning. Supervised vs unsupervised methods, data prep. Keep it up!

nkhuang
Автор

Great video! How are you chunking the videos, by paragraph, sentence, word char, etc? Are you using any overlap in the chunks? Have you tested you system with a smaller llama 2 model? What type of results would one get from maybe a llama 2 13B, or even a 7B that could possibly be ran from home?

unshadowlabs
Автор

Hi Ronan. Where is the code relevant to this video as of june 2024? In the Adv. FT repo, there is no trace of it AFAIK. Thanks.

TheLokiGT
Автор

Hi, I just paid for the access to the repo of this video, but I wasn't aware of the option to buy access to all projects in the repo, Is there any way to pay the difference and upgrade? how can I get in touch with you for that? love the work btw!

MarxOrx
Автор

Hi thanks!! A question for a model in which I have more than 2, 000 pdfs. Do you recommend improving the handling of vector databases? When do you recommend fine tunning and when do you recommend vector database

devtest
Автор

is "Context" a keyword which this specific model knows? how would it notice it after the blob of text

babyfox
Автор

you used plain text for the dataset, is it better than the json format? when choosing one or the other? thanks for the video!

izmhcdq
Автор

On Runpod, How do I get/amend Llama 70B API by TrelisResearch Template to work with an exposed TCP?
The terminal says connection is refused in the terminal and in VScode (preferred).
Other templates work fine.
Doesn't work: The SSH over exposed TCP: (Supports SCP & SFTP)
Works: the Basic SSH Terminal: (No support for SCP & SFTP) works fine.
The basic SSH terminal is not going to work with VScode to my knowledge.
Perhaps there is a way to edit the templates for these containers so they can work with VS code?
I'm really looking forward to digging into your tutorials :)

GrahamAndersonis
Автор

I want to fine-tune on my code. I have multiple folders and files in each project on which i want to fine-tune. Can this private repo work in that? Basically i want to fine-tune on my coding projects.

HemangJoshi