Mamba - a replacement for Transformers?

Показать описание

Mamba is a new neural network architecture proposed by Albert Gu and Tri Dao.

Timestamps:
00:00 - Mamba - a replacement for Transformers?
00:19 - The Long Range Arena benchmark
01:20 - Legendre Memory Units
02:07 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
02:38 - Combining Recurrent, Convolutional and Continuous-time Models with Linear State-Space Layers
03:28 - Efficiently Modeling Long Sequences with Structured State Spaces (S4)
05:46 - The Annotated S4
06:13 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
07:42 - Motivation: Why selection is needed
09:59 - S5
12:00 - Empirical evaluation

Topics: #mamba #foundation

References for papers mentioned in the video can be found at

For related content:

Samuel Albanie

Рекомендации по теме

Комментарии

Standford Labs are thriving right now. To think all this work is made OPEN-SOURCE at a period of hostile and fierce competition among the big tech companies.

shiholololo

Insane, I loved the way you went through multiple important prior papers before talking about mamba!

qwerasdliop

I really really like the build up of ideas through papers, it's a great way to introduce the idea while giving references that we can look up and trace ourselves and coming onto the scene with no context of the last few years of research it provides a neat overview

adamshaw

Thank you for such a good survey of the prior work! Your effort is noted and appreciated!

rabbit-hole-research

Hope the open source community builds on this

MeanGeneHacks

Always appreciate your excellent video explanations of cutting edge papers, thanks!

BradNeuberg

The technique of solving long-term memory problems using polynomial projection is somewhat similar to using FFT for multiplication. Essentially, both methods use highly efficient information representations with almost orthogonal channel capacity to represent the original information

xyh

Thats a really high quality content. I also really like the way you highlight the text when you read over it, this makes it easier to follow along!

Rojfos

Thanks for this, I feel caught up again! I've seen several papers popping up with alternatives to the transformer architecture, but I lacked a framework to grok them. The way you put this paper in a broader context, both in terms of the new benchmark for long range arenas and the emphasis on "no free lunch" w/re to LTI vs SSM was really helpful.

Fritzid

The crux of the performance of this network lies in the fact that they are using coefficients of legendre polynomial as a basis which allowed the information to be highly compressed with minimal information loss, thinking about sequence memory, moving away from iterative or recursive processing to a more holistic, algebraic form of memory management.

SethuIyer

Man, those papers include hardcore numerical linear algebra :D

kobilica

I noticed that @havenhq had tuned a chat version of the pretrained Mamba-2.8B on huggingface. I played it on colab and it feels like a decent chatbot already. I'm very excited about the future of this architecture

draygn

Honestly how do you make sense of these papers? I've listened to the whole video and still haven't got a clue what it is about. Quite a lot of brilliant people out there do to work like this.

alileevil

I need an ‘explain it like I’m five’ version of this. 😄

But I hope it means something strong is coming down the pipe.

Ben_D.

Thanks for the video, would love to have a more detailed explaination based on the related works before!

johnny

I liked this video so much that I reached for the like button 3 times while watching it.
Awesome context on S4. This is extremely helpful for getting the context and stripping the hype to get to the meaning.

That's definitely a sub and I am off to watch all the other videos

Dart_ilder

This does it for my 'aspiration video' of the week.

Kobe

Very encouraging that they included the situation in which S6 did poorly! If there are no other catches this looks incredible!

synapsomorphy

As a person new to the field, I greatly appreciate the way you presented things here!

fiery_transition

Remember, the RWKV mentioned is the one from its paper, the RWKV v4, there isn't yet a paper for v5 and v6, but v6 is similar to Mamba

Edit: it was updated today

JorgetePanete

Mamba - a replacement for Transformers?

Mamba - a replacement for Transformers?

Mamba Might Just Make LLMs 1000x Cheaper...

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Codestral Mamba: NEW Powerful Opensource Coding Model!

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

Black Mamba Face Off

MAMBA LLM for Personalized Medicine?

MOBLEY WITH THE JAM #shorts

Mamba Mentality 🐍🔥

MAMBA AI (S6): Better than Transformers?

758: The Mamba Architecture: Superior to Transformers in LLMs — with Jon Krohn (@JonKrohnLearns)

Megamind - Black Mamba

Understanding Mamba and State Space Models

#shorts #youtubeshorts #explore #watchnow #new That’s Mamba! #kobebryant #nike

Codestral-Mamba (7B) : Testing the NEW Mamba Coding LLM by Mistral (Beats DeepSeek-V2, Qwen2?)

Wrong Number - Khaligraph Jones x Shekinah Karen (Official Music Video)

Aespa 'Black Mamba' Ningning High Note | MV vs Live Perform vs Acapella#aespa#ningning#win...

Is Zachariah Branch Faster Than A Black Mamba? 👀🐍 #shorts

JAMBA MoE: Open Source MAMBA w/ Transformer: CODE

Kobe Bryant: Torn Achilles Mamba Mentality

👂 ASMR MAMBA FRUIT CHEW CANDY LEMON FLAVOR AND EATING SOUNDS 👂 #asmr #shorts

Krutika and mamba cute moments ❤️😻#shorts #viral

S8ul funny snax singing #shorts #bgmishorts #s8ul #mortal #snaxgaming #mamba