Lecture 12.1 Self-attention

Показать описание

ERRATA:
- In slide 23, the indices are incorrect. The index of the key and value should match (j) and theindex of the query should be different (i).
- In slide 25, the diagram illustrating how multi-head self-attention is computed is a slight departure from how it's usually done (the implementation in the subsequent slide is correct, but these are not quite functionally equivalent). See the slides PDF below for an updates diagram.

In this video, we discuss the self-attention mechanism. A very simple and powerful sequence-to-sequence layer that is at the heart of transformer architectures.

Lecturer: Peter Bloem

DLVU

Рекомендации по теме

Комментарии

Finally an actual _explanation_ of self-attention, particularly of the key, value and query that was bugging me a lot. Thanks so much!

derkontrolleur

Google should rank videos according to the likes and the number of previously viewed videos on the same topics: this should go straight to the top for Attention/Transformer searches because I have seen and read plenty, and this is the first time the QKV as dictionary vs RDBMs made sense; that confusion had been so bad it literally stopped me thinking every time I had to consider Q, or K, or V and thus prevented me grokking the big idea. I now want to watch/read everything by you.

MrOntologue

This is the best explanation of self-attention I have ever seen! Thank you VERY MUCH!

ArashKhoeini

Wow - only 700 views for probably the best explanation of Transformers I came across so far! Really nice work! Keep it up!!! (FYI: I also read the blog post)

constantinfry

A very clear and broken down explanation of self-attention. Definitely deserves much more recognition.

sohaibzahid

Best explanation out there, highly recommended. Thank you!

dhruvjain

Saved lots of hours with this simple but awesome explanation of self-attention, thanks a lot!

tizianofranza

This is a really excellent video. I was finding this a very confusing topic but I found it really clarifying.

Ariel-pxhz

This is the kind of content that deserves the like, subscribe and share promotion. Thank you for your efforts, keep up!

szilike_

Literally the BEST explanation of attention and transformer EVER!! Agree with everyone else about why this is not ranked higher :(
I'm just glad I found it !

workstuff

The best ever video showing how self-attention works.

thcheung

holy shit, been trying to wrap my head around self-attention for a while, but it all finally clicked together with this video.
very well explained, very good video :)

HiHi-iugf

This is the best explanation of multi-head self attention I've seen.

josemariabraga

I think one of the best videos describing self-attention. Thank you for sharing.

farzinhaddadpour

best explanation i found for self attention and multi head attention on internet, thank you sir

AlirezaAroundItaly

I have gone through 10+ videos on this, but this is the best ...hats off

sathyanarayanankulasekaran

Read the blog post and then found this presentation, what a gift!

maxcrous

I had to leave a comment, the best explanation of Query, Key, Value I have seen!

free_guac

This is the best explanation i have ever heard

davidadewoyin

Finally i have intuitive view of seld_attention . Thank you😇

Mars.

Lecture 12.1 Self-attention

Lecture 12.1 Self-attention

Self-attention in deep learning (transformers) - Part 1

Self Attention in Transformer Neural Networks (with Code!)

[ML 2021 (English version)] Lecture 10: Self-attention (1/2)

CS 182: Lecture 12: Part 1: Transformers

Lecture 12: Flash Attention

Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention

Attention in transformers, visually explained | Chapter 6, Deep Learning

Multi Head Attention Part 2: Entire mathematics explained

EfficientML.ai Lecture 12 - Transformer and LLM (Part I) (MIT 6.5940, Fall 2023)

EE599 Project 12: Transformer and Self-Attention mechanism

Transformers - Part 1 - Self-attention: an introduction

[ML 2021 (English version)] Lecture 11: Self-attention (2/2)

[ML 2021 (English version)] Lecture 12: Transformer (1/2)

5 tasks transformers can solve?

Self-Attention Equations - Math + Illustrations

BE544 Lecture 12 - Vision Transformers (ViTs) Explained

#12 QANet: Combining Local Convolution With Global Self-Attention for Reading Comprehension

Coding the self attention mechanism with key, query and value matrices

Attention is all you need. A Transformer Tutorial. 1: Self-Attention

Ali Ghodsi, Deep Learning, Attention mechanism, self-attention, S2S, Fall 2023, Lecture 9

Why is Self Attention called 'Self'? | Self Attention Vs Luong Attention in Depth Lecture ...

DeepMind x UCL | Deep Learning Lectures | 8/12 | Attention and Memory in Deep Learning

Self Attention and Transformers (ASU Intro AI Fall 2023 Final Lecture)