filmov
tv
Coding the 124 million parameter GPT-2 model
Показать описание
In this lecture, we code the entire 124 million parameter GPT-2 Model class in Python.
This includes the following components:
Token + Positional embedding
Transformer block
Layer normalization
Output layer
We understand the theory, mathematical intuition and also do the coding for the entire implementation.
After this lecture, you will have a firm understanding of how the entire GPT-2 architecture works.
0:00 Birds eye view of GPT-2 architecture
7:45 Token, positional and input embeddings
17:29 Dropout layer
20:47 The 8 steps of the transformer block
32:37 Post transformer layer normalisation
33:36 Output layer
40:20 Coding the entire GPT-2 architecture in Python
51:42 Testing the GPT model class on a simple example
53:51 Parameter and memory calculations
57:56 Conclusion and summary
=================================================
=================================================
Vizuara philosophy:
As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.
Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.
No cost. No hidden charges. Pure old school teaching and learning.
=================================================
🌟 Meet Our Team: 🌟
🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)
🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)
🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)
🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)
🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)
🎓 Sourav Jana (Software Developer at Vizuara)
This includes the following components:
Token + Positional embedding
Transformer block
Layer normalization
Output layer
We understand the theory, mathematical intuition and also do the coding for the entire implementation.
After this lecture, you will have a firm understanding of how the entire GPT-2 architecture works.
0:00 Birds eye view of GPT-2 architecture
7:45 Token, positional and input embeddings
17:29 Dropout layer
20:47 The 8 steps of the transformer block
32:37 Post transformer layer normalisation
33:36 Output layer
40:20 Coding the entire GPT-2 architecture in Python
51:42 Testing the GPT model class on a simple example
53:51 Parameter and memory calculations
57:56 Conclusion and summary
=================================================
=================================================
Vizuara philosophy:
As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.
Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.
No cost. No hidden charges. Pure old school teaching and learning.
=================================================
🌟 Meet Our Team: 🌟
🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)
🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)
🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)
🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)
🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)
🎓 Sourav Jana (Software Developer at Vizuara)
Комментарии