Transformer Encoder in 100 lines of code!

Показать описание

ABOUT ME

RESOURCES

PLAYLISTS FROM MY CHANNEL

MATH COURSES (7 day free trial)

OTHER RELATED COURSES (7 day free trial)

TIMESTAMP
0:00 What we will cover
0:53 Introducing Colab
1:24 Word Embeddings and d_model
3:00 What are Attention heads?
3:59 What is Dropout?
4:59 Why batch data?
7:46 How to sentences into the transformer?
9:03 Why feed forward layers in transformer?
9:44 Why Repeating Encoder layers?
11:00 The “Encoder” Class, nn.Module, nn.Sequential
14:38 The “EncoderLayer” Class
17:45 What is Attention: Query, Key, Value vectors
20:03 What is Attention: Matrix Transpose in PyTorch
21:17 What is Attention: Scaling
23:09 What is Attention: Masking
24:53 What is Attention: Softmax
25:42 What is Attention: Value Tensors
26:22 CRUX OF VIDEO: “MultiHeadAttention” Class
36:27 Returning the flow back to “EncoderLayer” Class
37:12 Layer Normalization
43:17 Returning the flow back to “EncoderLayer” Class
43:44 Feed Forward Layers
44:24 Why Activation Functions?
46:03 Finish the Flow of Encoder
48:03 Conclusion & Decoder for next video

Рекомендации по теме

Комментарии

If you think I deserve it, please consider hitting the like button and subscribe for more content like this :)

CodeEmporium

Best video out there for encoders, especially for beginners!

michellekelly-eejj

Next level video *especially* because of the dimensions laid out and giving intuition for things like k.transpose(-1, -2). Likely the best resource out right now!! Thanks for all your work!

sushantmehta

Best video on encoder. The backtracking of encoder concept...like a top-down approach is really amazing and helps to understand easily

shubhamgattani

This is the best explanation I have gone through

surajgorai

This is the most detailed Transformer video, THANK YOU!
I have one question, the values is [30, 8, 200, 64], before we reshape it, shouldn't we permute it first? like:
values = values.permute(0, 2, 1, 3).reshape(batch_size, max_sequence_length, self.num_heads * self.head_dim)

AnthonyY-oq

Superb and so love these classes! Will watch all of them one by one

jingcheng

It's really helpful that you are going through all the sizes of the various vectors and matrices.

wryltxw

Immense amount of effort put into video. Really appreciate the explanation especially keeping in mind the PyTorch aspect for beginners. Showing details like tensor dimensions throughout the code is just next level. Keep these videos coming.

aamirbadershah

bro... i love how u dive deep into explanations. You're a very good teacher holy shit

moseslee

I watched the entire series and it gave me a deeper understanding on how all of this works. Very well done!!!! Takes a real master to take a complex topic and break it down in such a consumable way. I do have one question: What is the point of the permute? Can we not specify the shape we want in the reshape call?

danielbrooks

You are awesome .The way you teach is incredible.

ulmwfue

This video was really informative. Thank you for all the detailed explanations!

seyedmatintavakoliafshari

@CodeEmporium
The transformer series is awesome!
It is very informative.
I have one comment, It is usually recommended to perform dropout before normalization layers. This is because normalization layers may undo dropout effects by re-scaling the input. By performing dropout before normalization, we ensure that the inputs to the normalization layer are still diverse and have different scales.

salemibrahim

Thank you, I going through all your videos. great work!

pierrelebreton

Very clear, useful and helpful explanation! Thank you!

gigabytechanz

Appreciate your work! As someone else mentioned, hope you can do an implementation of training the network for a few iterations.

KurtGr

Hi Ajay. I think, we need to make a small change in the forward() function of the encoder class. We should be doing `x_residual = x.clone() # or x_residual = x[:]` instead of `x_residual =x`. This will ensure that x_residual contains a copy of the original x and is not affected by any changes made to x.

prashantlawhatre

Awesome content as always ! Are you planning to demonstrate a training example of training for the encoder for the next video ? For example on a wikipedia data sample or something like that ?

TransalpDave

Thanks for the great series. Would be very helpful if you'd attach the Colab.

chenmargalit

Transformer Encoder in 100 lines of code!

Transformer Encoder in 100 lines of code!

Lets code the Transformer Encoder

Transformer encoder layer

Blowing up the Transformer Encoder!

Coding Position Encoding in Transformer Neural Networks

Transformer Encoder - Built From Scratch with Python | Machine Learning | Data Science

A Very Simple Transformer Encoder for Protein Classification in PyTorch

Ep1 - How to make Transformer (Encoder Decoder) Models Production Ready?FAST, COMPACT and ACCURATE

Logic Gates Learning Kit #2 - Transistor Demo

Jordan Peterson Shares a Simple Technique He Uses to Memorize Anything

Pytorch for Beginners #41 | Transformer Model: Implement Encoder

Word Embeddings & Positional Encoding in NLP Transformer model explained - Part 1

#shorts Audio ic Replacement soldering Tips #mobilerepairing

Transformers - Part 4 - Encoder remarks

Crazy High RPM Fan 😱 wait for Results #shorts #viral

Attention; Transformer (Encoder)

This is how wind turbines work and produce power@ Sustainable Green Energy system

Building an Encoder-Decoder Transformer from Scratch!: PyTorch Deep Learning Tutorial

Mr. Robot Sucks

PyTorch in 100 Seconds

The Transformer architecture

Transformers for beginners | What are they and how do they work

DIY mini Arduino CNC drawing machine

What are Autoencoders?