BEST Datasets for LLMs | Plus: Create Your Own

preview_player
Показать описание
How to choose the best data set to fine-tune a specific LLM, like MPT-30B-Chat? What are the perfect data sets to pre-train an AI model or a LLM (Large Language Model).

The easiest way is at Huggingface. Fine the perfect dataset, discover new data sets and see what the structure and the content of datasets look like, before you create your own data set for your fine-tuning of your LLM.

#finetune
#finetuning
#datasets
#dataset
#ai
Рекомендации по теме
Комментарии
Автор

Thanks for this!!! However, this could have been even better if you gave an actual demo on a notebook of how you would approach and modify a dataset. Or creating a dataset with the Evol-Instruct technique or the orca's approach of creating a data set. Still great content though

earltan
Автор

hey, have you done or recomend any video regarding the hw to run your onpremise llm? or cloud?

danson
Автор

is there open source datasets for arabic language ?

walidsalahuddin
Автор

Hi, I'm enjoying the videos, do you have a discord server or any way to get in cotact with you personally? Thank you!

marcussamuel
Автор

How can i optimise data content with my personal data ?

mohamedmohamoud
Автор

your channel is gold, but unfortunately there is no clear path to walk through the videos for a beginner who has no idea where to start and where to go after

abdoualgerian
Автор

I think the pricing is bit off, you can get A100 80GB for lot less than $5/h

NeuroScientician
Автор

4090 ti is straining under 4-bit quantized 30-b version. Reasonable token speed though . ROCm coming to 7900 XTX, a card that is practically 1/2 the price of the 4090 (essentially, you'll have free electricity consumption for quite some time when you plan in the price difference). Taken all together you'll have cca 600-700 Watt per hour power consumption when the model is running. All in all you'll have a great AI toy for under 3, 000-3, 500 US$ (Ryzen 9 16 core, 32 thread and all combined).

blablabic