Coding the 124 million parameter GPT-2 model

Показать описание

In this lecture, we code the entire 124 million parameter GPT-2 Model class in Python.

This includes the following components:

Token + Positional embedding
Transformer block
Layer normalization
Output layer

We understand the theory, mathematical intuition and also do the coding for the entire implementation.

After this lecture, you will have a firm understanding of how the entire GPT-2 architecture works.

0:00 Birds eye view of GPT-2 architecture
7:45 Token, positional and input embeddings
17:29 Dropout layer
20:47 The 8 steps of the transformer block
32:37 Post transformer layer normalisation
33:36 Output layer
40:20 Coding the entire GPT-2 architecture in Python
51:42 Testing the GPT model class on a simple example
53:51 Parameter and memory calculations
57:56 Conclusion and summary

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

Include the playlist link in the description of every video, this can get traction for you, also it's helpful for us to navigate easily. Thank you for your great work.

vinaypattanashetti

Super-fine-tuned class to build understanding of Llm from scratch with BitsandBytes of attention all you need paper, Thank you team for all your efforts

bganeshgulhane

thank you!!! one of the best overviews of the inner workings of the transformer architecture!!!

rbrowne

Can you also make videos on how to make multi-modal LLMs which can handle images and videos as well from scratch? Brilliant lecture and very clearly understandable as was the case for every single video in this series. thank you!!

RishavBhattacharjee-wj

Awesome, Amazing Just Marvellous Lecture :D

Omunamantech

I would like to join LLMs course, but I see it’s fully booked. Could you please let me know if there’s a waitlist or any other option for enrollment. Can you please suggest if this is better for an LLM or Generative AI interview?

ImranKhan-gyp

Wow truly magical lecture
Will it please be possible to just create the odd of visual flow map for us to download 👍

tripchowdhry

Can you provide any details on how you calculated the size of the model (124M Parameters)?

rbrowne

This 50257 size vector is showing context relation to all other words in vocabulary?

pendekantimaheshbabu

Coding the 124 million parameter GPT-2 model

Coding the 124 million parameter GPT-2 model

Let's reproduce GPT-2 (124M)

Let's build GPT: from scratch, in code, spelled out.

This guy bought #bitcoin at $1 😳 🥳 #btc #crypto #cryptocurrency #davincij15

The Thirty Million Line Problem

The Simple Question that Stumped Everyone Except Marilyn vos Savant

HOW CHINESE STUDENTS SO FAST IN SOLVING MATH OVER AMERICAN STUDENTS

Fine-tuning Large Language Models (LLMs) | w/ Example Code

LLMs Learn Tools at 775M Parameters!

The Most Controversial Problem in Philosophy

Introduction to building machine learning models in R with mikropml (CC124)

What Can Huge Neural Networks do?

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

Building LLMs from the Ground Up: A 3-hour Coding Workshop

3BY3 CUBE CHALLENGE #viral #cube

Gopher Explained: 280 BILLION Parameter Model Beats GPT-3

WE WENT TO ARMY BOOT CAMP! -You Should Know Podcast- Episode 123

When siblings fight over the VR headset... #RecRoom

Ek jhatke mein ho jayega The End 💔

I programmed some creatures. They Evolved.

Tutorial 124 - Using pretrained models as encoders in U-Net

24 Carat Gold means 100 percent pure Gold and calculate gold purity on carat basis

12a: Neural Nets

PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU