MAMBA AI (S6): Better than Transformers?

Показать описание

MAMBA (S6) stands for a simplified neural network architecture that integrates selective state space models (SSMs) for sequence modelling. It's designed to be a more efficient and powerful alternative to Transformer models (like current LLMs, VLMs, ..) , particularly for long sequences. It is an evolution on classical S4 models.

By making the SSM parameters input-dependent, MAMBA can selectively focus on relevant information in a sequence, enhancing its modelling capability.

Does it have the potential to disrupt the transformer architecture, that almost all AI systems currently are based upon?

#aieducation
#insights
#newtechnology

Рекомендации по теме

Комментарии

Another interesting architecture is the Tolman-Eichenbaum Machine which is inspired by the hippocampus and lends some interesting abilities to infer latent relationships in the data.

mjp

The way you say hello community is a ray of sunshine 🌞 😊

sadface

It's clear transformers can be improved. Excited to see this proposal play out. Thanks for the update!

mike-qff

First video i’ve watched from you and very impressed! Looking forward to watching more

_tnk_

Just as they start etching the transformer architecture onto silicon ha!

StephenRayner

One of the problems I face when trying to implement simple models which utilize a latent space, is the volatility of their input and output sizes. Never should a model require truncation, nor should a model allow inaccuracies. How for example, shall you model a compression algorithm (encode-decode) for any and all data that can exist? You are required to make the latent space before the model, effectively becoming part of the preprocessing step.
This is of course, expected and within reason.
I am one to think the solution to this problem is one which would uppend most of the field.

lizardy

The GPT family of models are a decoder-only architecture which is not covered by the patent.

laurinwagner

Great coverage, and thanks once again. One issue I am grappling with is attention, which is managed at "run-time" (i.e. inference) on the prompt for transformers, where Mamba seems to capture this concept entirely during training. No need for an attention matrix, as with transformers. Very long context windows, improved access to early information from the stream, and faster performance. Love all this.

My concern / reasoning: Removing the "run-time" attention at inference means we're using statistical understandings of language from training. For prompts that are quite varied from the training set data, can Mamba LLMs excel at activities that aim for creativity and brainstorming?

Also seems to me that training Mamba LLMs on multiple languages may degrade predictability in any one language since the "attention" (conceptually) is calculated at training time. But I am still pondering this; certainly may be wrong as I wrap my head around it!

planorama.design

My intuition: Transformers for capture very linked concepts and words in each chapter and its summarization and mamba for union and interconexión of all sumarized ideas (no linked words but link group of very disperse, distributed among chapters, ideas )

javiergimenezmoya

What's stored in the real space if not the position? Isn't the example phase space storing an even bigger vector because it now doesn't only store the position of the center of the mass but also the velocity?

shekhinah

Conceptually, this is brilliant: Savoir-Faire for Accuracy and Precision.

However, a deeper understanding of non-Matrix mathematics and challenges of serial hardware engineering would be greatly appreciated.

davidreagan

can you make more content for state space

EkShunya

I do think that an artificial brain does plug to many engines StepByStep as the arithmetic calculator, the logical reasoner, the theorem prover, etc becoming it to a cyborg-like.

juancarlospizarromendez

I appreciate your attempt at simplifying and introducing how state spaces are used in a very particular application at dynamical systems. However I am afraid you are missing quite a lot and, perhaps, confused about the mathematics.

renanmonteirobarbosa

hi, I am developing offline chatbot with RAG. Should I use Llama 7b as the llm model? Or should I choose the Zephyer 7B model? It needs to work locally without internet.

oguzhanylmaz

This not good for that startup that is building transformer chips

remsee

I came to see a new better means of AC voltage conversion. I was disappointed.

davidjohnston

Interesting
(and also all replies here, there doesnt seam to be a place anymore where thinkers can exchange ideas).
Do you know of a model using this concept (to try out in lm studio or in jupiter notebook ?).
Personally i think they way LLM's work/are trained is not the way to go.
To many useless facts inside them, for fact they should just use a callout to wiki pedia or other sites.
LLM's 'world domain', should be language, no politics, no famous people, but theoretical skills, translations, medicine, law, math, physics coding, etc. Not who was Trump or JF kennedy or Madona. Those gigs should be removed.

qwertasd

Just the use of term "AI" implies zero real understanding. It doesn't really mean anything, its "marketing speak".

csmrfx

MAMBA AI (S6): Better than Transformers?

MAMBA AI (S6): Better than Transformers?

Mamba Might Just Make LLMs 1000x Cheaper...

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

The Largest Mamba LLM Experiment Just Dropped

Mamba vs. Transformers: The Future of LLMs? | Paper Overview & Google Colab Code & Mamba Cha...

BEYOND MAMBA AI (S6): Vector FIELDS

Mamba, SSMs & S4s Explained in 16 Minutes

Mamba - a replacement for Transformers?

Deep dive into how Mamba works - Linear-Time Sequence Modeling with SSMs - Arxiv Dives

JAMBA MoE: Open Source MAMBA w/ Transformer: CODE

How to Fine-Tune Mamba on Your Data

MAMBA LLM for Personalized Medicine?

Mamba architecture intuition | Shawn's ML Notes

Mamba-Palooza: 90 Days of Mamba-Inspired Research with Jason Meaux: Part 1

Enfin une mémoire à long terme pour l’IA : MAMBA, SSM, S4, S6 & Transformers

MambaByte: Token-Free Language Modeling

Mamba sequence model - part 1

Webinar on Mamba vs Transformer

Mamba part 2 - Can it replace Transformers?

Mamba part 4 - System Details and Implementation

DJI T40 - I just wants to land, let me land #t40 #dji

The Best Robot Vacuum Tier List

Back to FreeFire🔥with Samsung Galaxy S23 Ultra❤️

The State Space Model Revolution, with Albert Gu