Why Transformer over Recurrent Neural Networks

Показать описание

#transformers #machinelearning #chatgpt #gpt #deeplearning

Рекомендации по теме

Комментарии

That's not the main reason, RNN keep adding the embeddings and hence override information that came before where as in case of transformer embeddings are there all the time and attention can pick the ones that are important.

IshtiaqueAman

that was a great video!
i find learning about such things generally easier and more interesting, if they are compared to other models/ideas that are similar but not equal

NoahElRhandour

Note that the decoder in Transformer outputs one vector at a time as well

untitledc

This answered a question I didn't have. Thanks!

schillaci

I think lstms are more tuned towards keeping the order, because although transformers can assemble embeddings from various tokens, they don't know what follows what in a sentence.

But, perhaps with relative positional encoding they might be equipped just about enough to understand the order of sequential input

IgorAherne

YouTube recommend me more videos like this plz

brianprzezdziecki

Thanks for sharing such valuable information! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). Could you explain how to move them to Binance?

PeriandroBarragan

An important caveat is that transformers like the decoder and GPT models are trained autoregresively with no context of the words coming after.

sandraviknander

This was cool but not sure it was explained correctly or I didn’t understand fully. I study transformers and the global attention mechanism is word prediction comparing it to every other past word and input. How does that predict future words?

lavishly

This is the best explanation I’ve ever seen RNN vs Transformer. Is there similar video like this for self attention by any chance? Thank you

kenichisegawa

You should have put LSTMs as a middle step

aron

Does a decoder model share these same advantages? Without the attention mapping wouldn’t it would be operating with the same context as an RNN?

jackrayner

The main reason is that rnn has what we call the exploding and vanishing gradient descent..

free_thinker

Can you do Fourier Transform replacing the attention head

jugsma

Aren’t most of the transformers used, based on causal self-attention? That doesn’t seem to have the bidirectional thing to it?

drdca

Don’t transformer models generate one token at a time? It’s just they’re faster as calculations can be done in parallel

alfredwindslow

What if you wanted to train a network to take a sequence of images (like in a video) and generate what comes next? Wouldn't that be a case where RNNs and its variations like LSTM and GRUs are better since each image is most closely related to the images coming directly before and after it?

vastabyss

What I'm wondering is. Why do all APIs charge you credits for input tokens for transformers? For me, it shouldn't make a difference for a transformer to take 20 tokens as input or 1000 (as long as it's within its maximum context lengths). Isn't that the case that transformer always pads the input to its maximum context length anyway?

Laszer

how we can relate this to masked multi head attention concept of transformers, this video is kind of conflicting with that, any expert ideas here please ..

sreedharsn-xwyi

But there is also a version of RNN with attention.

manikantabandla

Why Transformer over Recurrent Neural Networks

Why Transformer over Recurrent Neural Networks

Transformers vs Recurrent Neural Networks (RNN)!

CNNs, RNNs, LSTMs, and Transformers

Recurrent Neural Networks (RNNs), Clearly Explained!!!

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

Transformers, explained: Understand the model behind GPT, BERT, and T5

Why Transformers over LSTMs? #deeplearning #machinelearning

What are Transformers (Machine Learning Model)?

Machine Learning MCQs Part 5 | Neural Networks | Prepare for Exams! By @professorrahuljain

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

Recurrent Neural Networks to Sentence Transformer

Illustrated Guide to Transformers Neural Network: A step by step explanation

Attention mechanism: Overview

Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)

Transformers | Basics of Transformers

5 concepts in transformer neural networks (Part 1)

Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)

Attention Mechanism In a nutshell

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Why Recurrent Neural Networks are cursed | LM2

Transformers | What is attention?

Transformers | how attention relates to Transformers

Why LSTM over RNNs? #deeplearning #machinelearning

What is Mutli-Head Attention in Transformer Neural Networks?