AI Experts MERGED! 🐬 Mistral-1x-22b is BENDING THE RULES (SLERP Explained)

preview_player
Показать описание
We take a deep dive into Mistral's new "expert extraction" method. This technique combines 8 experts from their Mixtral 8x22b model into one using a fancy technique called SLERP. The results are surprising, showing a single, powerful AI with unexpected capabilities. Is this the future of large language models? Watch to find out!

Tell us what you think in the comments below!

Комментарии
Автор

Well, the fact that it takes aprox 1/8 of the VRAM is definitely a useful feature.

ernestuz
Автор

It's interesting in that there's a verified viable method for recombining the expert weights.
The small performance drop, however, is entirely expected, especially when you look at expert count ablations. Practically though, recombination could allow for more fine-grained inference tailoring after training a single wider model (e.g. choose 1, 2, 4, 8 experts depending on the hardware and performance tradeoffs desired). It would also be interesting to see if the performance degradation is layer dependent (something that would be impossible to ablate with bespoke models due to combinatorics).

hjups
Автор

Whoaa this is cool, wonder if we'll learn more from this in the future? 😮

Hexagoner
Автор

Really interesting results. Like others have mentioned, the performance over hardware requirements make this very enticing as a subject for further research, especially as the race to run inference closer and closer to the local substrate picks up.

Nif
Автор

Even if the performance of the resulting model are not fantastic right now, they might be with more research.

TomM-po
Автор

It is 1/8 the size, but when it exists as 8 experts, then "bits of knowledge" can mean different things (same vectors combinations, but can have a different result depending on expert chosen). So this would mean some nuance is lost by merging them all together. I think that an important feature of a group of experts.

erikjohnson
Автор

so is this the new best 22b model at least

pigeon_official
visit shbcf.ru