Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Показать описание

Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are finding applications all over Deep Learning, be it computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GANs), Speech or even Biology. Among other things, transformers have enabled the creation of powerful language models like GPT-3 and were instrumental in DeepMind's recent AlphaFold2, that tackles protein folding.

In this speaker series, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they're applied in different fields. We do this by inviting people at the forefront of transformers research across different domains for guest lectures.

0:00 Introduction
2:43 Overview of Transformers
6:03 Attention mechanisms
7:53 Self retention
11:38 Other necessary ingredients
13:32 Encoder Decoder Architecture
16:02 Advantages & Disadvantages
18:04 Applications of Transformers

Рекомендации по теме

Комментарии

Thanks for sharing this lecture. I’m looking forward to the other videos!

dailygrowth

Great lecture, please post the other lectures as well

rishabhahuja

I am very much looking forward to these amazing lectures related to Transformer!

zihanwu

Thank you so much for sharing! I hope to learn how transformers can be used in climate modeling. Also it might be useful to quickly define 'token' for newcomers to the field.

bjornlutjens

With all due respect, this seems more like end of a semester project presentation. teaching a topic is not piling up materials it is starting from high-level concept to detail model. You might want to look into Justin Johnson's slides on self-attention for a reference on how to teach these concepts.

SuperAlijoon

Thank you for your work. I hope I will build it without hugging face and transformer. I love low level work.
I am sure the world will love you guys' work 🌎🌍🌏

jonathansum

The worst 20 mins transformer introduction I have ever seen, lol. But thanks for organizing the seminar, look forward to the following sessions from the speakers.

stevehan

I was expecting a much greater depth and reasons for various constructs of the transformer.

mananshah

What does "attend a token" at 17:20 ?

andrea-mjce

Is there any chance of getting the different links from the slideshow in the video description?

qilex

You mention several times that a self attention layer performs only linear operations, that is why FFNN block is needed. But why is it linear if it contains softmax on the arbitron weights, which are itself a function of the inputs?

arefaref

This is awesome! I’m wondering will there be any assignments published?

dilyarbuzan

Will you go into applying transformers for non NLP time series data?

simonberglund

Do you have practice exercises for this course? Without practice, lectures are useless, seems like understood but when someone asks and/or apply in a project, I struggle.

ppujari

Hello will the slides be publicly available? Thank you for the very nice content.

dimitrisproios

The lecture was excellent overall. However, it would be even more effective if a single instructor led the entire session for continuity and cohesion.

manoranjansahu

Is it required to have knowledge about LSTM or attention mechanism to understand this courses bundle ?

andrea-mjce

Usually when I view courses taught by professors they speak slowly and don't assume the viewers know everything.

MohanRadhakrishnan

They should've named "Attention is all you need" as "You just want attention" :D
A missed oppurtunity

achyuthakrishnakoneti

Release more of these videos on YouTube.

chyldstudios

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford CS25: V1 I Self Attention and Non-parametric transformers (NPTs)

Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

clear voice CS25 Transformers United 2023 Introduction to Transformers w Andrej Karpathy

Stanford CS25: V2 I Neuroscience-Inspired Artificial Intelligence

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Mechanistic Interpretability - Stella Biderman | Stanford MLSys #70

Transformers - Part 1 - Self-attention: an introduction

5 Free LLM Courses RoadMap in 2023

Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

25. Transformers

Transformers for Structural Extraction

The Surprising World of Mechanistic Interpretability Unlocking AI Secrets

Cross Attention vs Self Attention

Transformer Circuits Part 1

Contrastive Decision Transformers

A Walkthrough of A Mathematical Framework for Transformer Circuits

What is Transformer? Electrical transformers | Introduction #engineering #electricalengineering

Neel Nanda on mechanistic interpretability #artificialintelligence #gpt

Minae Kwon's talk on 'Reward Design with Language Models'