Multi GPU Fine Tuning of LLM using DeepSpeed and Accelerate

preview_player
Показать описание
Welcome to my latest tutorial on Multi GPU Fine Tuning of Large Language Models (LLMs) using DeepSpeed and Accelerate! In this video, I'll guide you through the entire process of setting up and fine-tuning large language models across multiple GPUs to achieve optimal performance and efficiency.

You'll learn:

✅ How to install and configure DeepSpeed and Accelerate.
✅ The key features and benefits of using DeepSpeed for memory optimization and parallelism.
✅ Step-by-step instructions on setting up multi-GPU fine-tuning.
✅ Best practices for achieving faster training times and handling larger models.
Whether you're a data scientist, machine learning engineer, or AI enthusiast, this tutorial will provide you with valuable insights and practical knowledge to enhance your deep learning projects.

Don't forget to like, comment, and subscribe to my channel for more tutorials and updates on cutting-edge AI technologies. Hit the bell icon to stay notified about my latest videos. Your support helps me create more content like this!

Happy fine-tuning!

Join this channel to get access to perks:

To further support the channel, you can contribute via the following methods:

Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
#deepspeed #finetuning #llm
Рекомендации по теме
Комментарии
Автор

Thank you and keep going to give tutorials.

👏👏👏

MrCdofitas
Автор

Bhai, your videos are good. It may help others dev very soon. Keep on going. 🎉

_.Light._
Автор

Thank you for the video . Just a small request if possible could you also make a video on how to to build a production ready rag chatbot using llama3 for company's confidential data?

gowithgaurav
Автор

Your vedio is helpful. can you please make the repo public?

ChloePeng-rj
Автор

can you make a video using kaggle multi gpu . i have tried but did not get succeed.

CodewithRiz
Автор

Hi, thank you for the video, you do great work!

Can I please ask if it's possible to do multi gpu training if the whole model itself doesn't fit on one gpu when loaded? For example, I'm training using the Trainer from huggingface Llama3.1 8b in full precision on 4 gpus of 16 GB VRAM each. The model takes up about 32GB when loaded, so each graphic is taken up to about 8GB (8*4).
When I run the training, the number of steps equals = (dataset length) * (number of epochs) / (batch size). If it were distributed, it's additionally divided by the number of graphics cards.
So is this even possible with this computational power and if so, is there any way to integrate this in Trainer from huggingface? I'm doing the code in jupyternotebook

Rman
Автор

How do we determine the number of GPUs required for fine-tuning each model?

muhammedaslama
Автор

thx for sharing
any chance that u can share the code?

chien