How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

Показать описание

Today, we delve into the process of setting up data sets for fine-tuning large language models (LLMs). Starting from the initial considerations needed before dataset construction, we navigate through various pipeline setup questions, such as the need for embeddings. We discuss how to structure raw text data for fine-tuning, exemplified with real coding and medical appeals scenarios.

We also explore how to leverage embeddings to provide additional context to our models, a crucial step in building more general and robust models. The video further explains how to transform books into structured data sets using LLMs, with an example of transforming the book 'Twenty Thousand Leagues Under the Sea' into a question-and-answer format.

In addition, we look at the process of fine-tuning LLMs to write in specific programming languages, showing a practical application with a Cipher query for graph databases. Lastly, we demonstrate how to enhance the performance of a medical application with the use of embedded information utilizing the Superbooga platform.

Whether you're interested in coding, medical applications, book conversion, or simply fine-tuning LLMs in general, this video provides comprehensive insights. Tune in to discover how to augment your models with advanced techniques and tools. Join us on our live stream for a deep dive into how to broaden the context in local models and results from our book training and comedy sets.

0:00 Intro
0:44 Considerations For Finetuning Datasets
2:45 Reviewing Embeddings
5:35 Finetuning With Embeddings
8:31 Creating Datasets From Raw/Books
12:08 Coding Finetuning Example
14:02 Medicare/Medicaid Appeals Example
17:01 Outro

#machinelearning #ArtificialIntelligence #LargeLanguageModels #FineTuning #DataPreprocessing #Embeddings

Рекомендации по теме

Комментарии

This content is top notch among ML and AI in YouTube showing us how it really works!

cesarsantos

Okay so after a cup of coffee and watching a couple of times, WOW. You helped me so much thank you. This has been driving me nuts and you make it look so easy to fix. I wish I was as smart as you. Thank you again. 🎉

timothymaggenti

Comedy dataset update! I have found an approach I think I like for it, though I didn't have time to complete it for this video. So, I will also cover that in today's live stream!

AemonAlgiz

You’re literally a genius! I appreciate you taking the time to share the knowledge with us! Exactly what I was looking for… how to create a dataset and in such a well put together video. Thank you

RAGNetwork

Amazing, Thanks a lot for sharing your reflections on your work and experience ! It is much appreciated ! First time I check something like this quickly browsing and stick without having to review / study and come back later. I am able to get a Birds eye view on the topic and options available for work, and the underlying purpose. 🥇Pure Gold. Definitely Subscribed !

flowers

Dude seriously your content is so clear and easy to follow keep it up!

HistoryIsAbsurd

Finally some freaking great tutorial! Practical, straight to the point and it works!!

fabsync

I would pay a lot of money for this information, thank you.

boogfromopenseason

I very much appreciate that you always have this way of listing the most important bullet points at the beginning

leont.

Great explanation with the right level of details and depth. Good stuff. Thanks!

rosenangelow

Amazing work... this channel is pure gold, the exact amount of concepts, everything is spot on. Nothing beats teaching by experience like you do.

pelaus

I knew I subscribed here for good reason. this is consistently extremely high quality information -- not the regurgitated stuff. This is super educational and has immensely improved my understanding.

Please keep going bud, this is great.

smellslikeupdog

Awesome content!! Thank you very much!!👏🏻👏🏻👍🏻

redbaron

Wow, how do you make everything look easy. Nice thanks. So East coast, man your early bird.

timothymaggenti

great explanations thanks a lot for your efforts making this great content!

babyfox

Thats awesome! And you can even save the new appeal to create more data !

Hypersniper

The appeal has been processed by the approval AI... And it passed! The prescription will now be covered. 😊
(Thank you for the video! I think datasets and install dependencies are ML's greatest pain points at the moment.)

jonmichaelgalindo

How would building a training set on a codebase look? Is there a good example of automating generation of a Q&A training set based on code? How do you chunk it to fit in context window - break it up by functions and classes? Where would extraneous stuff go, like requirements, imports, etc... Thanks for the great content!

kenfink

This video was awesome! I'm finally starting to wrap my head round this stuff. At the same time I'm realising the power that is being unleashed onto the world!
BTW did you see this new paper:SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. Looks like it's right up your alley!

arinco

Hey man, thanks for your videos they are instructive. I am new to LLMs and I think there is a significant gap in YouTube content with the new LLMs. I know there are videos on fine tuning GPT3 but I can't find anything like walk through in fine tuning a larger new open source model like Falcon-40b instruct. If there was a playlist going through the process: QA fine tune data definition, synthetic data production, fine tuning and test. I am sure others like myself will be very keen followers

danielmz

How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

Best Places to Find Datasets for Your Projects

How to create Datasets FAST?

How to Create Custom Datasets To Train Llama-2

How to build custom Datasets for Images in Pytorch

How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

BEST Datasets for LLMs | Plus: Create Your Own

Chat GPT Helps Me find and create datasets

How You can EASILY create Custom Datasets and Loaders!

SAS Programming Master Classes: SAS Libraries & Rules - 04 | Learn SAS Tutorial Free

How to access datasets on Kaggle to build your machine learning models?

How To Create Your Own Datasets | Machine Learning | All In One Code

Use ChatGPT To Create Sample Datasets For Excel

The Best Way to Prepare a Dataset Easily

How To Prepare Datasets For Training YOLOv5 Object Detection- Official - YOLOV5 Training

How to Create Datasets in Airflow!

Power BI Datasets | The Architecture explained & Guide to creating Datasets.

Where to get FREE Datasets to practice Data Analytics

Hugging Face Datasets #1 | Hosting Your Datasets (for Beginners)

How to Create Custom Datasets To Train LLMs using Bright Data!

Datasets 2/4: How to create a Dataset

Build your first machine learning model in Python

Create a Power BI streaming dataset for real-time dashboards

Creating datasets to evaluate your own LLM?

3 - Creating Datasets - Tableau CRM