MAMBA AI (S6): Better than Transformers?

preview_player
Показать описание
MAMBA (S6) stands for a simplified neural network architecture that integrates selective state space models (SSMs) for sequence modelling. It's designed to be a more efficient and powerful alternative to Transformer models (like current LLMs, VLMs, ..) , particularly for long sequences. It is an evolution on classical S4 models.

By making the SSM parameters input-dependent, MAMBA can selectively focus on relevant information in a sequence, enhancing its modelling capability.

Does it have the potential to disrupt the transformer architecture, that almost all AI systems currently are based upon?

#aieducation
#insights
#newtechnology
Рекомендации по теме
Комментарии
Автор

Another interesting architecture is the Tolman-Eichenbaum Machine which is inspired by the hippocampus and lends some interesting abilities to infer latent relationships in the data.

mjp
Автор

The way you say hello community is a ray of sunshine 🌞 😊

sadface
Автор

It's clear transformers can be improved. Excited to see this proposal play out. Thanks for the update!

mike-qff
Автор

First video i’ve watched from you and very impressed! Looking forward to watching more

_tnk_
Автор

Just as they start etching the transformer architecture onto silicon ha!

StephenRayner
Автор

One of the problems I face when trying to implement simple models which utilize a latent space, is the volatility of their input and output sizes. Never should a model require truncation, nor should a model allow inaccuracies. How for example, shall you model a compression algorithm (encode-decode) for any and all data that can exist? You are required to make the latent space before the model, effectively becoming part of the preprocessing step.
This is of course, expected and within reason.
I am one to think the solution to this problem is one which would uppend most of the field.

lizardy
Автор

The GPT family of models are a decoder-only architecture which is not covered by the patent.

laurinwagner
Автор

Great coverage, and thanks once again. One issue I am grappling with is attention, which is managed at "run-time" (i.e. inference) on the prompt for transformers, where Mamba seems to capture this concept entirely during training. No need for an attention matrix, as with transformers. Very long context windows, improved access to early information from the stream, and faster performance. Love all this.

My concern / reasoning: Removing the "run-time" attention at inference means we're using statistical understandings of language from training. For prompts that are quite varied from the training set data, can Mamba LLMs excel at activities that aim for creativity and brainstorming?

Also seems to me that training Mamba LLMs on multiple languages may degrade predictability in any one language since the "attention" (conceptually) is calculated at training time. But I am still pondering this; certainly may be wrong as I wrap my head around it!

planorama.design
Автор

My intuition: Transformers for capture very linked concepts and words in each chapter and its summarization and mamba for union and interconexión of all sumarized ideas (no linked words but link group of very disperse, distributed among chapters, ideas )

javiergimenezmoya
Автор

What's stored in the real space if not the position? Isn't the example phase space storing an even bigger vector because it now doesn't only store the position of the center of the mass but also the velocity?

shekhinah
Автор

Conceptually, this is brilliant: Savoir-Faire for Accuracy and Precision.

However, a deeper understanding of non-Matrix mathematics and challenges of serial hardware engineering would be greatly appreciated.

davidreagan
Автор

can you make more content for state space

EkShunya
Автор

I do think that an artificial brain does plug to many engines StepByStep as the arithmetic calculator, the logical reasoner, the theorem prover, etc becoming it to a cyborg-like.

juancarlospizarromendez
Автор

I appreciate your attempt at simplifying and introducing how state spaces are used in a very particular application at dynamical systems. However I am afraid you are missing quite a lot and, perhaps, confused about the mathematics.

renanmonteirobarbosa
Автор

hi, I am developing offline chatbot with RAG. Should I use Llama 7b as the llm model? Or should I choose the Zephyer 7B model? It needs to work locally without internet.

oguzhanylmaz
Автор

This not good for that startup that is building transformer chips

remsee
Автор

I came to see a new better means of AC voltage conversion. I was disappointed.

davidjohnston
Автор

Interesting
(and also all replies here, there doesnt seam to be a place anymore where thinkers can exchange ideas).
Do you know of a model using this concept (to try out in lm studio or in jupiter notebook ?).
Personally i think they way LLM's work/are trained is not the way to go.
To many useless facts inside them, for fact they should just use a callout to wiki pedia or other sites.
LLM's 'world domain', should be language, no politics, no famous people, but theoretical skills, translations, medicine, law, math, physics coding, etc. Not who was Trump or JF kennedy or Madona. Those gigs should be removed.

qwertasd
Автор

Just the use of term "AI" implies zero real understanding. It doesn't really mean anything, its "marketing speak".

csmrfx