Mistral 8x7B Part 2- Mixtral Updates

preview_player
Показать описание

Try the model

For more tutorials on using LLMs and building Agents, check out my Patreon:

My Links:

Github:
Рекомендации по теме
Комментарии
Автор

This is amazing! I can't wait to give the new version a try. Thanks for sharing this update! You're truly one of the most valuable YouTube channels when it comes to discussing these models because your videos have a lot of technical depth.

DoktorUde
Автор

Thank you, great explanation very helpful!

TailorJohnson-ly
Автор

Is a marvelous architecture I was on the fence since it did not seem to me to benefit the people who runs local but after that explanation. I am officially Hyped. I wonder why the earlier MoE models weren't well received at all if is the same architecture defenetly Mistral have the secret sauce. And releasing the code for vLLMis just the Chad move thank you for the video. Can't wait to try this on my machine even Finetune it soon.

impactframes
Автор

A 👍 for Mistral. Definetly it will be finetuned. We saw that are many high performant LLM's are finetuned based on Mistral 7B. For example Zephyr 7B, Notus etc. I see that the context size is 32K? Look at Linux a very succesfull open source OS and we will see in the LLM world also that open source LLM will be as good or surpass closed LLM from big companies like ClosedAi, Anthropic, Google, Microsoft...

henkhbit
Автор

Looks like thebloke just uploaded gguf quantized 8x. Its 18gb for 2bit and needs patched llama cpp

Edit: With the latest mixtral branch of llama.cpp the 2bit model is loading, slowly..

terbospeed
Автор

For the performance rival GPT 3.5 with 80GB of VRAM. Now Private corp can run this on their office without worrying about their data safety.

nufh
Автор

what would be the dataset which they trained the model ? is it separate dataset for each model ?

venkateshmunagala
Автор

Running thebloke instruct quantized 5-bit on cpu. Very tolerable 8 tokens per second. 16 core AMD, ddr5 RAM

theosalmon
Автор

No way to use mixtral without a gpu? Even the quantized ones?

gangs
Автор

Sir, any advice if I use japanese language?

vinsmokearifka
Автор

i can't understand they can train this with "we train experts and routers simultaneously." Can you explain clearly

boybro