Stanford CS25: V4 I Overview of Transformers

Показать описание

April 4, 2024

Рекомендации по теме

Комментарии

Can't believe, ... Just today, we started the part about LSTM and transformers in my ML course, and here it comes
Thank you guys !

ilm_yanfa

Very cool! Thanks for posting this publicly, it's really awesome to be able to audit the course :)

Drazcmd

Awesome, thank you Stanford online for sharing these amazing video series

fatemehmousavi

Hello Everyone! Thank you very much for uploading these materials. Cheers

benjaminy.

AMazing stuff! Thank you for publishing this valuable material!

marcinkrupinski

I recently started to explore using transformers for timeseries classification as opposed to NLP. Very excited about this content!

JJGhostHunters

Great!! Finally It's time for CS25 V4🔥

mjavadrajabi

Thanks for sharing this course and palestry Staford. Congratulations . Here the Brazil

lebesguegilmar

it's finally released! hope y'all enjoy(ed) the lecture 😁

styfeng

I want to know more about 'filters.' Are they human or computer processes or mathematical models? The filters are a reflection, I'd like to understand more about. I hope they are not an inflection, that would be an unconscious pathway.

This is a really sweet dip into the currency of knowledge and these students are to be commended however, in the common world there is a tendency developing towards a 'tower of babel'.

Greed may have an influence that we must be wary of. I heard some warnings in the presentation that consider this tendency.

I'm impressed by these students. I hope they aren't influenced by the silo system of capitalism and that they remain at the front of the generalization and commonality needed to keep bad actors off the playing field.

GeorgeMonsour

Be careful using anthropomorphic language when talking about LLMs. Eg: thoughts, ideas, reasoning. Transformers don’t “reason” or have “thoughts” or even “knowledge”. They extract existing patterns in the training data and use stochastic distributions to generate outputs.

GerardSans

future artificial intelligence
i was into talk this
probability challenge
Gemini ai talking ability rapid talk i suppose so
it's splendid

TV

In summary, Transformers mean using tons of weight matrixes, leading to way better results.

egonkirchof

it would be great if CS25: V4 created another playlist in youtube.

Anbu_Sampath

what is said in 13:47 is incorrect.
Large language models like ChatGPT or other state-of-the-art language models do not only have a decoder in their architecture. They employ the standard transformer encoder-decoder architecture. The transformer architecture used in these large language models consists of two main components:
The Encoder:
This encodes the input sequence (prompt, instructions, etc.) into vector representations.
It uses self-attention mechanisms to capture contextual information within the input sequence.
The Decoder:
This takes in the encoded representations from the encoder.
It generates the output sequence (text) in an autoregressive manner, one token at a time.
It uses self-attention over the already generated output, as well as cross-attention over the encoder's output, to predict the next token.
So both the encoder and decoder are critical components. The encoder allows understanding and representing the input, while the decoder enables powerful sequence generation capabilities by predictively modeling one token at a time while attending to the encoder representations and past output.
Having only a decoder without an encoder would mean the model can generate text but not condition on or understand any input instructions/prompts. This would severely limit its capabilities.
The transformer's encoder-decoder design, with each component's self-attention and cross-attention, is what allows large language models to understand inputs flexibly and then generate relevant, coherent, and contextual outputs. Both components are indispensable for their impressive language abilities.

ramsever

i thought it will be extensively detailed lecture on transformers to teach people exactly how it works, but this was nothing more then modern ai news and very high level explanation of the news, very
disappointing

ummnine

Stanford's struggles with microphones continue.

laalbujhakkar

This is not what I expected. What a complete terrible explanation. I was expecting a complete history of Transformers. The fall of the Deception's or how Optimus Prime came to be. A very misleading title indeed.

si

Stanford CS25: V4 I Overview of Transformers

Stanford CS25: V4 I Overview of Transformers

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V3 I Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM

[VIET] Stanford CS25: V4 I Overview of Transformers - Part 1 (Phiên bản lồng tiếng)

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Stanford CS25: V3 I How I Learned to Stop Worrying and Love the Transformer

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford CS25: V1 I Self Attention and Non-parametric transformers (NPTs)

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

[VIET] Stanford CS25: V4| Lecture 1 – Part 2: RLHF, ChatGPT, Gemini, Chain of Thought Reasoning

The genius of Andrej Karpathy | John Carmack and Lex Fridman

The Possibilities of AI [Entire Talk] - Sam Altman (OpenAI)

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Andrew Ng: Opportunities in AI - 2023

Advice for graduate students in AI | Andrej Karpathy and Lex Fridman

Stanford Seminar - ML Explainability Part 1 I Overview and Motivation for Explainability