Mistral 7B - The New 7B LLaMA Killer?

preview_player
Показать описание

My Links:

Github:

00:00 Intro
00:49 Mistral 7B model
01:31 Mistral AI Blog Post
04:25 Code Time

#llama2 #llms
Рекомендации по теме
Комментарии
Автор

Seems like a great model. Very coherent and truly uncensored in its base form, also with a really open license, much unlike the infamous Llama2.

clray
Автор

The SYSTEM prompt should go between <s> and [INST], the prompt format seems to be the same as Llama.

ViktorFerenczi
Автор

Super that smaller models get more attention. Better smalller models tuned for a specific tasks and manage by an orchestrator of agents than having monolithic DINO model like OpenAI which is not open! Thanks for showing 👍

henkhbit
Автор

I'm so grateful you returned to LLM reviews

alx
Автор

Thanks for this video, everything was well explained! I think this model works so well 😄

luisxd
Автор

Thanks 🎉,
Can run on cpu only locally

AliAlias
Автор

It would be great if you started testing these models on RAG performance since it's one of the main uses many of us look at these models for 🙂

alealejandroooooo
Автор

can we see a fine-tune video on it? Also, can you make video where we can combine RAG and Finetuned model together?

ANURAG-wzg
Автор

Llama 7B isn’t even as good as Wizard Vicuña models but Mistral is the best 7B model I’ve used and it actually understands larger contexts than chat GPT4

remsee
Автор

Please do post up if you figure out how to fine tune it.

pi
Автор

Don't run at 4 bits, that cripples quality. Run at least a 5 bit GGUF (the bigger 5 bit variant) or higher. Running at 8 bits is usually a good compromise.

ViktorFerenczi
Автор

Can you explain how, considering architectural differences, such as the sliding window attention, this model can be run by Llama implementations and working in llama.cpp, exllama, etc?

DaeOh
Автор

You need a colab pro to run mistral (or any 7b model)?

SundarRaman-tp
Автор

Do you know what modules I would target if I were Lora training this model? I tried to find it for like 3 minutes lol and didn’t find it. I’m definitely gonna wanna Lora train this model. Se how it can preform

Nick_With_A_Stick
Автор

If they can make a 13b with capabilities near 30b in conversation/narration that has 30-40k tokens limit (they state a theoretical attention span of 128K tokens), that can run on a 10gb 3080 with fast inference times, and of course completely uncensored. I will be happy.

I am using GPT-4 (not API in SillyTavern or similar) to make it run text-based adventures following some simple (and some more advanced) TTRPG-like rules with a few .json files, and while it's doing great (especially in combat situations), the 8k limit and censoring are ruing it a lot of times. I was having a moment with a gorgeous elf I saved from a Wyvern, and it wouldn't even complete a simple kissing scene half of the times I tried. Now, let's not even talk about some more "hot" things or brutal actions (can't even take the head of a goblin in a memorable fashion without a warning about inappropriate content that stops it...)

OnigoroshiZero
Автор

How much ram it needs on my mac? What is comperison against vicuna?

wiktorm
Автор

Has anyone tried this with Apple Silicon? It looks like PyTorch does have Metal integrations?

toastrecon
Автор

Isn't it strange to compare a fine tuned model to not fine tuned models ?

yannickpezeu
Автор

Yes now meta will release the 34B llama 2 in response

Hypersniper
Автор

Mistral cheated. It excluded a large amount of common knowledge in its data set, including main characters in popular TV shows and movies, in order to get 7b parameters to work as well as 13b Llamas on multiple choice, hence easy to objectively grade, tests.

It failed every one of my popular culture questions, including the instruct version. And all of the question covered POPULAR (non-esoteric) songs, bands, movies, TV shows... And it didn't just fail, it hallucinated like crazy. For example, it randomly returned A-list celebrities with no association with the show in question. Other models generally at least returned another character from the same show. It performed worse than Falcon 7b on pop culture questions, and far worse than Llama 1 and 2 7b LLMs.

Even when it comes to science, coding, math, logic... it may get slightly more answers right, hence score slightly higher on multiple choice tests, but overall it performs MUCH worse than the 7b LLamas, let alone the 13b models. I say this because it often hallucinates completely random things, or simply starts talking about random stuff that has nothing to do with the prompt, while all the llamas, despite also hallucinating a lot, at least stay on topic.

In conclusion, as an overall general purpose chat bot and source of knowledge Mistral 7b is better than Falcon 7b, but clearly inferior to Llama 1 and 2 7b, and it's not even close. Even the LLM test results that Mistral did slightly better than Llama on are misleading because when you actually look at what they got wrong it was often absurdly wrong. The wrong answers are about 10x more likely to have nothing to do with the prompt than llama, but all the tests do is register it as another wrong answer, masking how bad Mistral really is compared to Llama.

brandon