MAMBA and State Space Models explained | SSM explained

Показать описание

We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs.
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael

Outline:
00:00 Mamba to replace Transformers!?
02:04 State Space Models (SSMs) – high level
03:09 State Space Models (SSMs) – more detail
05:45 Discretization step in SSMs
08:14 SSMs are fast! Here is why.
09:55 SSM training: Convolution trick
12:01 Selective SSMs
15:44 MAMBA Architecture
17:57 Mamba results
20:15 Building on Mamba
21:00 Do RNNs have a comeback?
21:42 AICoffeeBreak Merch

Great resources to learn about Mamba:

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Scientific advising by Mara Popescu
Video editing: Nils Trost
Music 🎵 : Sunny Days – Anno Domini Beats

Рекомендации по теме

Комментарии

I've a question. Given that SSMs are entirely linear, how do they conform with universal approximation theorem? I mean a lack of non-linear activation should imply that they should be particularly bad at approximate functions, but they are not.
Am i missing something?

Also really loved the video!

drummatick

Thanks! Looking forward to a Hyena video :)

partywen

I have to give a presentation on Mamba next week and I've been waiting for this video so I could finally learn what the hell I need to talk about

ShadowHarborer

Thank you for the shoutout to me repo!
I later realized it was an application of a known idea "heisen sequence", which is a pretty cool way to do certain associative scan operations via cumsum

peabrane

This is exactly the level of detail I needed right now. Thank you so much!

jamescunningham

Thank you very much for this thorough, well-curated, and comprehensive review of MAMBA.

OlgaIvina

Hats off to you for this amazing video! Best explanation of Mamba I have seen.

cosmic_reef_

I was waiting for exactly this topic! Thanks so much!

DerPylz

Nice video, good overview, which is what I was searching for

faysoufox

A big thanks for a comprehensive explanation of the Mamba Architecture & computations, @AICoffeeBreak!

ruchiradhar

Nice T-shirt! So excited to listen about new models!

harumambaru

Awesome video! I especially like the simple explanation and the visuals.

Emresessa

Thank you! This is by far the easiest-to-understand and most concise video that teaches the concepts of SSMs

李洛克-mu

this explanation was excellent. Thank you very much :)

hannes

Thank you so much !! you really super simplified it for any beginner level deep learner to understand

kumarivin

Thanks for the MAMBA video!

I always appreciate your insight on these new, influential papers! Your thoughts always pair well with a good cup of coffee. 😁☕️

MaJetiGizzle

Great.
There are a lot of failed explanation or completely wrong approach about SSM and Mamba on the internet, but finally I found the exact what I want.
Thank you for the video.

고준성-mg

@AICoffeBreak, thank you for the awesome video. Very small pet peeve which had me re-check all the math. At 11:20 it would make the explanation much easier to understand if you kept x 0 indexed as that is the notation you were going for since the beginning. Also, maybe making it explicit that you're taking t = L, although this is kind of obvious. This was an awesome lecture, thank you again.

rodrigomeireles

Thank you for the great Mamba explanation

serta

I'm not entirely sure on how SSM differ from RNNs, especially regarding how attention is being used. Theres still the bottleneck of h_t to h_{t+1} between time steps, which was one of the motivations for the attention layer -- so that information in one part of the sequence doesn't have to be squeezed before computation with information from another part of the sequence.
Is the main innovation from RNN to SSM the fixed delta, A, B, C formulation such that the training can be done in parallel for all time steps?

darkswordsmith

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

State Space Models (SSMs) and Mamba

Understanding Mamba and State Space Models

Mamba Might Just Make LLMs 1000x Cheaper...

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (COLM Oral 2024)

State Space Models (S4, S5, S6/Mamba) Explained

Mamba, SSMs & S4s Explained in 16 Minutes

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Mamba Language Model Simplified In JUST 5 MINUTES!

Mamba - a replacement for Transformers?

Install Mamba - State Space Model

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

The Illusion of State in State-Space Models (like Mamba)

State Space Models w/ Albert Gu & Karan Goel (Cartesia AI)

Mamba, Explained (state space models and Mamba as an alternative to transformer architecture)

The State Space Model Revolution, with Albert Gu

Were RNNs All We Needed? (Paper Explained)

Vision Mamba: Transforming Visual Learning with Bidirectional State Space Models

Introduction to Mamba SSM in PyTorch 🤖 🐍

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu