LSTM is dead. Long Live Transformers!

Показать описание

Рекомендации по теме

Комментарии

That's one of the best deep learning related presentations I've seen in a while! Not only introduced transformers but also gave an overview of other NLP strategies, activation functions and also best practices when using optimizers. Thank you!!

FernandoWittmann

Good to see Adam Driver working on transformers 😁

vamseesriharsha

For anyone feeling overwhelmed, it is completely reasonable, as this video is just a 28 minute recap for experienced machine learning practitioners, and lot of them are just spamming the top comments with "This is by far the best video", "Everything is clear with this single video" and all.

sanjivgautam

Thank you for this concise and well-rounded talk! The pseudocode example was awesome!

richardosuala

Its hard to overstate just how much this topic has(is) transformed the industry. As others have said, understanding it is not easy because there are a bunch of components that don't seem to align with one another and overall the architecture is such a departure from the most traditional things you are taught. I myself have wrangled with it for a while and its still difficult to fully grasp. Like any hard problem, you have to bang your head against it for a while before it clicks.

ajitkirpekar

Great talk. It's always thrilling to see someone who actually knows what they're supposedly presenting.

monikathornton

This is like 90% of what I remember from my NLP course with all the uncertainty cleared up, thanks!

lmao

I love this presentation
Doesn't assume that the audience knows far more than is necessary, goes through explanations of relevant parts of Transformers, notes shortcomings, etc;
Best slideshow I've seen this year, and it's from over 3 years ago

_RMSG_

Leo is an excellent professor. He explains difficult concepts in an easy-to-understand way.

JagdeepSandhuSJC

Wonderfully clear and precise presentation. One thing that tripped me up, though, is this formula at 4 minutes in:

Hi+1 = A(Hi, xi)

Seems this should rather be:

Hi+1 = A(Hi, xi+1)

which might be more intuitively written as:

Hi = A(Hi-1, xi)

cliffrosen

12:56 the review of the pseudocode of the attention mechanism was what finally helped me understand it (specifically the meaning of the Q, K, V vectors), what other videos were lacking. In the second outer for loop, I still don't fully understand why it loops over the length of the input sequence. The output can be of different length, no? Maybe this is an error. Also, I think he didn't mention the masking of the remaining output at each step so the model doesn't "cheat".

Scranny

World deserve more lectures like this one. I don't need examples on how to tune U-net, but the overview of this huge research space and ideas underneath each group.

BartoszBielecki

All i want is his level of humbleness and knowledge

BcomingHIM

I was trying to use similar super-low frequency sine trick for audio sample classification (to give network more clues about attack/sustain/release positioning). Never did I know, that one can use several of those in different phases. Such a simple and beautiful trick
The presentation is awesome

evennot

RIP LSTM 2019, she/he/it/they would be remembered by....

ProfessionalTycoons

You folks need to look into asymptotics and Padé approximant methods, or for functions of many variables as ANN's are you'd use the generalize Canterbury Approximants. The is not yet a rigorous development in information theoretic terms, but Padé summations (essentially repeated fraction representations) are known to yield rapid convergence to correct limits for divergent Taylor series in non-converging regions of the complex plane. What this boils down to is that you only need a fairly small number of iterations to get very accurate results if you only require approximations. To my knowledge this sort of method is not being used in deep learning, but has been used by physicists in perturbation theory. I think you will find it extremely powerful in deep learning. Padé (or Canterbury) summation methods when generalized are a way of extracting information from incomplete data. So if you use a neural net to get a few first approximants, and assume they are modelling an analytically continued function, then you have a series (the node activation summation) you can Padé sum and extract more information than you'd be able to otherwise.

Achrononmaster

This is hands down the best presentation on LSTMs and Transformers I have ever seen. The speaker is really good. He knows his stuff.

timharris

Best transformer presentation I’ve seen hands down. Nice job!

Johnathanaa

This finally made it clear for me why RNNs have been introduced! thanks for sharing

ismaila

Thanks for this! It gets to the heart of the matter quickly and in an easy to grasp way. Excellent.

briancase

LSTM is dead. Long Live Transformers!

LSTM is dead. Long Live Transformers!

LSTM is dead, long live Transformers!

LSTM model limitations

What is LSTM (Long Short Term Memory)?

Long Short-Term Memory (LSTM), Clearly Explained

Why Transformers over LSTMs? #deeplearning #machinelearning

LSTM Top Mistake In Price Movement Predictions For Trading

Simple Explanation of LSTM | Deep Learning Tutorial 36 (Tensorflow, Keras & Python)

CS 152 NN—22: RNNs: LSTM

MIT 6.S094: Recurrent Neural Networks for Steering Through Time

CNNs, RNNs, LSTMs, and Transformers

Tree-Structured LSTMs | Lecture 46 (Part 2) | Applied Deep Learning (Supplementary)

This can happen in Thailand

Introduction to LSTM's (Long Short Term Memory Networks) for DeepLearning

180 - LSTM Autoencoder for anomaly detection

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

LSTM Time Series Forecasting Tutorial in Python

9. Hybrid LSTMs [Long Short-Term Memory]

The Evolution of Neural Networks for NLP: From LSTMs to Transformers

Time Series Analysis Time Series Forecasting Implementation Python LSTM TensorFlow | Keras tutorial

LSTM Networks: Explained Step by Step!

Stock Prediction Using Tensorflow (RNN/LSTM)

How to eat Roti #SSB #SSB Preparation #Defence #Army #Best Defence Academy #OLQ

A Brilliant Explanation of LSTM Model | Long Short-Term Memory Model | Deep Learning Tutorial