Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Показать описание

In this video I teach how to code a Transformer model from scratch using PyTorch. I highly recommend watching my previous video to understand the underlying concepts, but I will also rehearse them in this video again while coding. All of the code is mine, except for the attention visualization function to plot the chart, which I have found online at the Harvard university's website.

It also includes a Colab Notebook so you can train the model directly on Colab.

Chapters
00:00:00 - Introduction
00:01:20 - Input Embeddings
00:04:56 - Positional Encodings
00:13:30 - Layer Normalization
00:18:12 - Feed Forward
00:21:43 - Multi-Head Attention
00:42:41 - Residual Connection
00:44:50 - Encoder
00:51:52 - Decoder
00:59:20 - Linear Layer
01:01:25 - Transformer
01:17:00 - Task overview
01:18:42 - Tokenizer
01:31:35 - Dataset
01:55:25 - Training loop
02:20:05 - Validation loop
02:41:30 - Attention visualization

Рекомендации по теме

Комментарии

personally, I find that seeing someone actually code something from scratch is the best way to get a basic understanding

comedyman

It also includes a Colab Notebook so you can train the model directly on Colab.

Of course nobody reinvents the wheel, so I have watched many resources about the transformer to learn how to code it. All of the code is written by me from zero except for the code to visualize the attention, which I have taken from the Harvard NLP group article about the Transformer.

I highly recommend all of you to do the same: watch my video and try to code your own version of the Transformer... that's the best way to learn it.
Another suggestion I can give is to download my git repo, run it on your computer while debugging the training and inference line by line, while trying to guess the tensor size at each step. This will make sure you understand all the operations. Plus, if some operation was not clear to you, you can just watch the variables in real time to understand the shapes involved.

Have a wonderful day!

umarjamilai

I have browsed YouTube for the perfect set of videos on transformer, but your set of videos (the video explanation you did on the transformer architecture) and this one is by far the best !! Take a bow brother, you have really contributed to the viewers in amount you cant even imagine. Really appreciate this !!!

ArslanmZahid

Greeting from China! I am PhD student focused on AI study. Your video really helped me a lot. Thank you so much and hope you enjoy your life in China.

yangrichard

Thank you Umar for our extraordinary excellent work! Best transformer tutorial ever I have seen!

aiden

One of the best tutorial to understand and implement the Transformer model...Thank you for making such a wonderful video

faiyazahmad

This video is incredible, never understood it like this before. I will watch your next videos for sure, thank you so much!

maxmustermann

Thanks a lot for such a detailed video. Your videos on transformer are best.

shresthsomya

Keep doing what you are doing. I really appreciate you taking out so much time to spread such knowledge for free. Been studying transformers for a long time but never have I understood it so well. The theoretical explanation in the other video combined with this practical implementation, just splendid. Will be going through your other tutorials as well. I know how much time taking it is to produce such high level content and all I can really say is that I really am grateful for what you are doing and hope that you continue doing it. Wish you a great day!

abdullahahsan

best video I have ever seen on whole youtube eon transformer model. Thank you so much sir!

raviparihar

Thanks for making it so easy to understand. I definitely learn a lot and gain much more confidence from this!

shakewingo

Thank God, it's not one of those 'ML in 5 lines of Python code' or 'learn AI in 5 minutes'. Thank you. I can not imagine how much time you must have spent on making this tutorial. thank you so much. I have watched it three times already and wrote the code while watching the second time (with a lot of typos :D).

MuhammadArshad

Dear Umar, your video is full of knowledge; thanks for sharing.

abdulkarimasif

Hey there! I enjoyed watching that video, you did a wonderful job explaining everything, and I found it super easy to follow along. Overall, it was a really great experience!

lyte

Dear Umar - thank you so much for this amazing and very clear explanation. It has deeply helped me and many others in understanding the theoretical and practical implementation of transformers! Take a bow!

SaiManojPrakhya-mpoe

WOW WOW WOW, though it was a bit tough for me to understand it, I was able to understand around 80 % of the code, beautiful. Thank you soo much

manishsharma

This is such a great work, I don't really know how to thank you but this is an amazing explanation of an advanced topic such as transformer.

VishnuVardhan-sxbq

Thanks for your detailed tutorial. Learned a lot!

goldentime

Hi Umar. I am a first year student at MIT who wants to do AI startups. Your explanation and comments during coding were really helpful. After spending about 10 hours on the video, I walk away with great learnings and great inspiration. Thank you so much, you are an amazing teacher!

physicswithbilalasmatullah

Really great explanation to understand Transformer, many thanks to you.

dbnbwzz

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Pytorch Transformers from Scratch (Attention is all you need)

Let's build GPT: from scratch, in code, spelled out.

Coding a ChatGPT Like Transformer From Scratch in PyTorch

[ 100k Special ] Transformers: Zero to Hero

NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT

Lecture 21 - Transformer Implementation

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Transformers Explained: Build a Transformer End-to-End!

TensorFlow Transformer model from Scratch (Attention is all you need)

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Pytorch Transformers for Machine Translation

Create a Large Language Model from Scratch with Python – Tutorial

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)

Lecture 1: Swin Transformer from Scratch in PyTorch - Hierarchic Structure and Shifted Windows Ideas

PyTorch Paper Replicating (building a vision transformer with PyTorch)

Vision Transformer from Scratch and Training Implementation

Building a Transformer Model from Scratch: Explained in Detail

Vision Transformer in PyTorch

Transformer: Concepts, Building Blocks, Attention, Sample Implementation in PyTorch

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Transformers, explained: Understand the model behind GPT, BERT, and T5

Generative Python Transformer p.1 - Acquiring Raw Data

Illustrated Guide to Transformers Neural Network: A step by step explanation