Coding Llama 2 from scratch in PyTorch - Part 3

preview_player
Показать описание
In this video series, you will learn how to train and fine-tune Llama 2 model from scrach.

The goal is to code LLaMA 2 from scratch in PyTorch to create models with sizes 100M, 250M and 500M params. In this third video, you'll learn about KV cache, RoPE, and Hugginface Trainer in detail.

📋 KV cache:

🪢 RoPE:

🤗 Trainer:

Sebastian Raschka:

💻 To follow along you can use this colab notebook:

🎥 Coding Llama 2 from scratch video series
Рекомендации по теме
Комментарии
Автор

So in this series, you don't use any pre-trained weights? You build and train the model from scratch on a custom dataset?

sharjeel_mazhar
Автор

Now hwow to scale it, like how to run it in mltple GPUs? or in Multiple nodes having multiple GPUs?

tharunbhaskar
Автор

First time watching your video. Keep going bro 💪, its your friend Afzal

Bebetter
Автор

@user-vd7im8gc2w

Why do you need position ids?

You use it to map the input ids to their respective position in the sequence.

Example:

input_ids = [100, 20, 4, 50]
position_ids =

print(position_ids)
>> [0, 1, 2, 3]

princecanuma
welcome to shbcf.ru