filmov
tv
Mamba Might Just Make LLMs 1000x Cheaper...

Показать описание
Would mamba bring a revolution to LLMs and challenge the status quo? Or would it just be a cope that may not last in the long term? Looking at the trajectories right now, we might not need transformers if mamba can actually scale but attention is probably still here to stay.
Special thanks to
- Gifted Gummy Bee
for helping with this video!
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Transformer: Attention Is All You Need
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Efficiently Modeling Long Sequences with Structured State Spaces
Flash Attention
Flash Attention 2
VMamba: Visual State Space Model
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
MambaByte: Token-free Selective State Space Model
Repeat After Me: Transformers are Better than State Space Models at Copying
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Music] massobeats - midnight
[Video Editor] @askejm, Lunie
Комментарии