Mistral Medium - The Best Alternative To GPT4

preview_player
Показать описание
Re-upload; the first one was cut off before the ending. I got access to Mistral Medium, Mistral's prototype model that is available only through API (for now). It performs incredibly well and is a fraction of the price of GPT4. This is a great replacement for developers building most use cases.

Enjoy :)

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? ✅

Rent a GPU (MassedCompute) 🚀
USE CODE "MatthewBerman" for 50% discount

My Links 🔗

Media/Sponsorship Inquiries 📈

Links:

Chapters:
0:00 - About Mistral Medium
2:28 - Pricing Comparisons
4:59 - Test Results
Рекомендации по теме
Комментарии
Автор

New model reached #1 on the LLM leaderboard SOLAR 10.7b, should I review it?

matthew_berman
Автор

What's remarkable is how quickly these models are developing, I remember what the quality was like early this year and we've seen massive improvements over that time that it's shocking many of us and likely scaring OpenAI, Microsoft and Google, as I suspect they didn't expect open source models to close the gap on them so quickly, and you really do have to wonder how much better they'll get over the next few years.

pauluk
Автор

I created a simple flask front-end and have this working as well. I use LLMs all day long for my work and this is the first time I can honestly say I'm more impressed with it than GPT4.

This is great news for LLMs in general, cause now OpenAI has an actual threat.

mattbarber
Автор

Dude, first: awesome channel! I was thinking that your tests have a serious chance of contaminating the training data of newer models. Therefore, i think adding different questions is always benefitial even if they are not harder than the previous ones.

georgesms
Автор

One thing that I've been thinking about recently: I get that it's good to have a set of standard "tests" for these models, but at what point do they become "overtrained" on those tests? Like I you wanted to "game the system", you could just have a pre-trained game of snake in python at the ready to "infer". I'd almost rather it be given something novel like: create the game of snake, but it's multiplayer and 3d. It is interesting to consider how these models go from plain inference to reasoning engines.

toastrecon
Автор

I liked seeing you add a couple unique questions. Consider adding a couple fresh questions to the standard set in each video.

brandon
Автор

⚠️ Matthew, what about creating a leaderboard of LLMs that could answer all your questions ? So, we could track what's the best one till this day ? Please consider that. It could be made using a single spreadsheet. Thanks. 🎉🎉❤

DihelsonMendonca
Автор

BEST OPEN SOURCE MODEL.. Every Video i hear that lol.

Dreamslol
Автор

wow your camera and lighting looks amazing!

rheale
Автор

Hi @matthew_berman,
Thanks for the excellent content like this one. I might say that it is better to have new variations of complex tasks at each evaluation. I am pretty sure the new models are trained or finetuned on your current evaluation tasks.

stephanembatchou
Автор

Please redo the mixtral and mistral medium test with variations of your current questions. I think there's a pretty good chance they have been trained on your questions. Especially if they approached you with an API key to test their model.

craiganderson
Автор

Impressive. And you have to consider that even GPT-4 struggles with the marble problem.
I think if you would prompt it again a few times it would get it.
And also mixtral sometimes does not get it. So it could be a coincidence that it performed worse on this question than mixtral.
Anyway really good model and great to see this progress.

fabiankliebhan
Автор

It's clear that in, say, a year these models will be able to handle most puzzles as well as most humans.
At this point AGI of a sort could be possible using clusters of models, local memory, and wrapper code.
These quasi AGIs could be directed to complete tasks autonomously - although they won't be sentient in any way.

coldlyanalytical
Автор

What is the best model available to train with your own data (documents) that include proprietary information for users to query in the form of questions?... looking at llama 2 at the moment.

trevoC
Автор

@matthew_berman How do you know your test cases (or any of their variants) have not made it into the training data ?

gidmanone
Автор

Would be nice if you made a current top models compilation video lol there's so much to keep up with

AINEET
Автор

BTW, I recommend your channel to everybody who asks me how to learn how to use these models and compare them to each other. I'm the principal engineer at my company and a lot of people ask me.

aldousd
Автор

@MatthewBerman. Please provide your list of best tested LLMS, please!

marcosbenigno
Автор

I think you should drop the easy questions that everyone gets right, and for the tricky questions, make them regenerate answers to see if they can get it right twice in a row. Or even scale the questions up incrementally to see where the breaking points are. For example, include a fourth runner, then a fifth, etc.

ldsviking
Автор

Hey Matthew, i have the feeling, after watching alot of your videos, that these questions should be changed.

Another thing. Could feel weird, but when i think about how fast we got a 7B-Model that is run by "Experts" - when can we run it at a smartphone or in a linux kernel?

Mcmeider