Mixtral - Mixture of Experts (MoE) from Mistral

preview_player
Показать описание
Mixtral is a new model using a mixture of experts (MoE) approach. It consists of 8x7B mistral models. It was pre-released on Friday, look for more details to come.

#largelanguagemodels #mixtral #mistral #rajistics

A version of the Mixtral model is here:

━━━━━━━━━━━━━━━━━━━━━━━━━
★ Rajistics Social Media »
━━━━━━━━━━━━━━━━━━━━━━━━━
Рекомендации по теме
Комментарии
Автор

What companies have jr devs downloading 80gb torrents 👀😂 Glad to see your shorts come through my feed to explain the latest goings ons.

d_b_
Автор

I am just reading a paper about mixture of tokens 😊

sadface
Автор

I thought zipit would get arround mixture of experts problems 😮

sadface
Автор

Great video. Alot of stuff to digest.

I'm doing a breakdown for myself. Please correct anything if I am wrong...I wanna learn.

RECAP4ME

I summarise

You have a bunch of experts (E1 E2 E3).
Experts are models like GPT4 who you want to evaluate data D.

Mistrals objective is to find a model ( expert ) which is by comparity) the best model in a list of models to pass D.

Mistrals internal workflow is two step workflow : Routing >Presentation

Mistral in routing assigns D to each model.

Models which cannot load D into its Tokens are dropped by Mistral workflow.

Models which can load D into its Tokens are used in Presentation phase.

The outputs (model inferences) are compared in a quantitative way (highest numbers for probability & expert indices) ..

Mistrsls output is a list in best model order E-high to E-low.

Then use the best model (E-high) for your AI application with D.

borntodoit