Coding the 124 million parameter GPT-2 model

preview_player
Показать описание
In this lecture, we code the entire 124 million parameter GPT-2 Model class in Python.

This includes the following components:

Token + Positional embedding
Transformer block
Layer normalization
Output layer

We understand the theory, mathematical intuition and also do the coding for the entire implementation.

After this lecture, you will have a firm understanding of how the entire GPT-2 architecture works.

0:00 Birds eye view of GPT-2 architecture
7:45 Token, positional and input embeddings
17:29 Dropout layer
20:47 The 8 steps of the transformer block
32:37 Post transformer layer normalisation
33:36 Output layer
40:20 Coding the entire GPT-2 architecture in Python
51:42 Testing the GPT model class on a simple example
53:51 Parameter and memory calculations
57:56 Conclusion and summary

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)
Рекомендации по теме
Комментарии
Автор

Include the playlist link in the description of every video, this can get traction for you, also it's helpful for us to navigate easily. Thank you for your great work.

vinaypattanashetti
Автор

Super-fine-tuned class to build understanding of Llm from scratch with BitsandBytes of attention all you need paper, Thank you team for all your efforts

bganeshgulhane
Автор

thank you!!! one of the best overviews of the inner workings of the transformer architecture!!!

rbrowne
Автор

Can you also make videos on how to make multi-modal LLMs which can handle images and videos as well from scratch? Brilliant lecture and very clearly understandable as was the case for every single video in this series. thank you!!

RishavBhattacharjee-wj
Автор

Awesome, Amazing Just Marvellous Lecture :D

Omunamantech
Автор

I would like to join LLMs course, but I see it’s fully booked. Could you please let me know if there’s a waitlist or any other option for enrollment. Can you please suggest if this is better for an LLM or Generative AI interview?

ImranKhan-gyp
Автор

Wow truly magical lecture
Will it please be possible to just create the odd of visual flow map for us to download 👍

tripchowdhry
Автор

Can you provide any details on how you calculated the size of the model (124M Parameters)?

rbrowne
Автор

This 50257 size vector is showing context relation to all other words in vocabulary?

pendekantimaheshbabu