Mamba - a replacement for Transformers?

preview_player
Показать описание
Mamba is a new neural network architecture proposed by Albert Gu and Tri Dao.

Timestamps:
00:00 - Mamba - a replacement for Transformers?
00:19 - The Long Range Arena benchmark
01:20 - Legendre Memory Units
02:07 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
02:38 - Combining Recurrent, Convolutional and Continuous-time Models with Linear State-Space Layers
03:28 - Efficiently Modeling Long Sequences with Structured State Spaces (S4)
05:46 - The Annotated S4
06:13 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
07:42 - Motivation: Why selection is needed
09:59 - S5
12:00 - Empirical evaluation

Topics: #mamba #foundation

References for papers mentioned in the video can be found at

For related content:
Рекомендации по теме
Комментарии
Автор

Standford Labs are thriving right now. To think all this work is made OPEN-SOURCE at a period of hostile and fierce competition among the big tech companies.

shiholololo
Автор

Insane, I loved the way you went through multiple important prior papers before talking about mamba!

qwerasdliop
Автор

I really really like the build up of ideas through papers, it's a great way to introduce the idea while giving references that we can look up and trace ourselves and coming onto the scene with no context of the last few years of research it provides a neat overview

adamshaw
Автор

Thank you for such a good survey of the prior work! Your effort is noted and appreciated!

rabbit-hole-research
Автор

Hope the open source community builds on this

MeanGeneHacks
Автор

Always appreciate your excellent video explanations of cutting edge papers, thanks!

BradNeuberg
Автор

The technique of solving long-term memory problems using polynomial projection is somewhat similar to using FFT for multiplication. Essentially, both methods use highly efficient information representations with almost orthogonal channel capacity to represent the original information

xyh
Автор

Thats a really high quality content. I also really like the way you highlight the text when you read over it, this makes it easier to follow along!

Rojfos
Автор

Thanks for this, I feel caught up again! I've seen several papers popping up with alternatives to the transformer architecture, but I lacked a framework to grok them. The way you put this paper in a broader context, both in terms of the new benchmark for long range arenas and the emphasis on "no free lunch" w/re to LTI vs SSM was really helpful.

Fritzid
Автор

The crux of the performance of this network lies in the fact that they are using coefficients of legendre polynomial as a basis which allowed the information to be highly compressed with minimal information loss, thinking about sequence memory, moving away from iterative or recursive processing to a more holistic, algebraic form of memory management.

SethuIyer
Автор

Man, those papers include hardcore numerical linear algebra :D

kobilica
Автор

I noticed that @havenhq had tuned a chat version of the pretrained Mamba-2.8B on huggingface. I played it on colab and it feels like a decent chatbot already. I'm very excited about the future of this architecture

draygn
Автор

Honestly how do you make sense of these papers? I've listened to the whole video and still haven't got a clue what it is about. Quite a lot of brilliant people out there do to work like this.

alileevil
Автор

I need an ‘explain it like I’m five’ version of this. 😄

But I hope it means something strong is coming down the pipe.

Ben_D.
Автор

Thanks for the video, would love to have a more detailed explaination based on the related works before!

johnny
Автор

I liked this video so much that I reached for the like button 3 times while watching it.
Awesome context on S4. This is extremely helpful for getting the context and stripping the hype to get to the meaning.

That's definitely a sub and I am off to watch all the other videos

Dart_ilder
Автор

This does it for my 'aspiration video' of the week.

Kobe
Автор

Very encouraging that they included the situation in which S6 did poorly! If there are no other catches this looks incredible!

synapsomorphy
Автор

As a person new to the field, I greatly appreciate the way you presented things here!

fiery_transition
Автор

Remember, the RWKV mentioned is the one from its paper, the RWKV v4, there isn't yet a paper for v5 and v6, but v6 is similar to Mamba

Edit: it was updated today

JorgetePanete