Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices - OUTSTANDING Results!

preview_player
Показать описание
This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.

We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.

Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.

Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.

Be sure to subscribe to stay up-to-date on the latest advances in AI.

My Links

Links:
Рекомендации по теме
Комментарии
Автор

So i never post comments, but the way you explained this was by far the best i have seen online, i wish I found your channel 8 months ago :) Please keep posting videos your explanation is very well thought off and put together.

israelcohen
Автор

Absolutely fantastic! Really appreciate, detailed, clear breakdown of concrete steps that let us drive value, rather than the clickbait hypetrain that everyone else is on.

ratsock
Автор

Big thanks for the detailed walkthrough—really learned a lot from your video!

williammcguire
Автор

Thank you! You have a talent for explaining and planning a workshop! Thank you for your work!

petroff_ss
Автор

i like the thumbnails, topic types, explains methods, and the mr who explain.
nice channel very valuable infos ❤

talatala
Автор

You are amazing! This is the best explanation about this topic. I liked it and just subscribed. Thank you very much !!!

gustavomarquez
Автор

Thank you so much for sharing that fantastic clip! It was really informative. I'm currently looking into fine-tuning a model with my ERP system, which handles some pretty complex data. Right now, I'm creating dataframes and using panda-ai for analytics. Could you guide me on how to train and make inferences with this row/column data? I really appreciate your time and help!

RameshBaburbabu
Автор

Amazing video, thanks for the best explanation I’ve ever seen on YouTube. Could you also please make a video how to finetune the phi3 model? 🙏

SilentEcho-dq
Автор

Nice video. I have a question: At 8:10, is there any reason why you set add_special_tokens=false in the .encode_plus method? I thought special tokens are added during training so wouldn't it make more sense to set add_special_tokens=true if we want to know how large the biggest training example will be?

hellohey
Автор

Did you play Chef Slowik in the movie "The Menu"?

Hotboy-qn
Автор

I think calling the .for_inference method before training will interfere with training, so it seems like a bad idea. The training in the notebook converges without a problem for me using a T4 GPU if just skipping that step.

hellohey
Автор

Is the output from Ollama on your MacBook in real-time? Or you have speed up in the video? On my 2014 iMac, it is significantly slower. It's about time for a new one. What are the technical specifications of your Mac?

SilentEcho-dq
Автор

I'm able to get as far as inference, once the model is trained i get an error: name 'FastLanguageModel' is not defined

but thank you for the tutorial

lorenzoplaatjies
Автор

what are the parameters need to update, If I using a 1000 question and answer pair in CSV and what will be the value of those parameters.

ganeshkumara
Автор

I sir . First i have started with limited data. It works fine . After i added the another 20 data in csv. Not answering correctly. Some other answer it is giving.why?

ganeshkumara
Автор

Did you have any experience with fine tuning for non english data on this model, any suggestions for a good multilingual open sources models?🙏

andrew.derevo
Автор

rather than using google colab + compute for training, what are your thoughts on using a local machine + GPU?

madhudson
Автор

Hi i want to finetune llama3 for English to urdu machine translation can you guide me regarding this.dataset is opus 100

azkarathore
Автор

"The A100 works well" You don't say lol -- bruh this is a $50K GPU which costs $2-$3K/month to run.

jonassteinberg
Автор

Thank you for this wonderful video, very educative. I have a question, incase I have a dataset with questions and answers but the answers are not written in proper English grammar which at hugging face as What is the best way to make a mode return an answer that is grammatically formatted

ronaldmatovu