MPT-7B - The New KING of Open-Source LLMs

preview_player
Показать описание
In this video, I review the 4 new models from MosiacML. MPT-7b base, instruct, chat, and storywriter-65k+. Many people are calling this the best open-source model out there currently. I test the chat version locally using GPT4All (super simple setup) and storywriter-65k+ using HuggingFace. You can set this up locally as well, which is great. Let's see how it performs!

Enjoy :)

Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗

Media/Sponsorship Inquiries 📈

Links:

Chapters:
0:00 - Intro
0:17 - Blog Announcement
3:21 - How To Install MPT-7B Chat & Instruct
4:01 - MPT-7B Chat Test
9:44 - MPT-7b StoryWriter 65k+ Test
10:50 - Outro
Рекомендации по теме
Комментарии
Автор

Hey Matthew, really appreciate your implementing feedback. This quick benchmark is perfect. Thanks.

avi
Автор

Really appreciate the hard work that you've been putting in to keep us up to date Matthew, thank you so much.

djannias
Автор

You should be using things like reflection and such to fully vet the model. It's one thing to fail on the first go, but if it gets there with reflection, that's pretty important...

maficstudios
Автор

The results from this model are really amazing. Great video. I like how most models fail the basic "tests" that you have given them. I probably would change the "snake" question slightly -- ie, I would grade it sort of like a milestone... and subtract (5?) points for each time you had to go thru the feedback of "This doesn't work, fix it". And maybe limit the number of attempt to 10 or 20 etc. So, at least that way we can get some idea if it is decent at coding vs just fail fail fail... cause all of them have failed.
It amazes me how you are able to keep up with this tech!

marcfruchtman
Автор

Awesome and informative video, thanks for the update! These smaller models are really coming along at lightning fast speed. Imagine the next 6 months 😮

hendrikbonthuys
Автор

2:42 But the story input is shown to be 67873 tokens, which is larger than 64K or 65K.

JohnDlugosz
Автор

In your coding example do you expect it to compress the code? It probably truncated the output, is there a Continue option (I know Oobabooga has this)
Also, part of the value is to converse with the model, in this case ask it about the missing methods. My guess is that more code was intended to be output.

erikjohnson
Автор

I have a question.... Why are these open source models not quantized yet to be able to run on consumer PCs? I want to request you to make a video on quantization if it's possible, thanks in advance. Your videos are great 👍.

sirrr.
Автор

Sure hope we get a 13billion version of this. The context window is so good

priestesslucy
Автор

For the math one, when the models get it right, you might consider removing the parenthesis to test if it grasps order of operations.

deathybrs
Автор

Awesome glad you are using my test :) my test I stole from someone else lol

joe_limon
Автор

thanks for the video. i am glad that are more and more new opensource models out. do you know if there is an API for GPT4All?

spenzakwsx
Автор

Any hint on how to get the MPT-7B-Storywriter-65k+ into GPT4All?

johnwiering
Автор

Regarding the "3 killers" in a room, and a new killer enters... I would argue there are 4 killers in the room: (3 alive, 1 dead).

marcfruchtman
Автор

I don't know if it's an issue with the gpt4all or the model but no matter what I do the answers are incredbly short and often cut off. Actually in my tests it seems like very first question you ask is of a good length but each following answer is more and more limited, as if it's looking at full history of chat and longer the chat becomes the smaller the output, despite no where near the max of 2048 tokens. So not sure if it's a bug with the program or the model at this time. I have an AMD gpu so it's likely running off CPU only but it's still about as fast as your output in the video, so i doubt that's the issue and a 7b model shouldn't be running out of ram on the instruct/chat versions if i have 32gb. so not sure what's going on. I may try getting it running in kobold cpp which is what i usually use on cpu models, probably just need to find the ggml version of it.

zengrath
Автор

Hi Matthew, what’s the best LLM and web front end for deploying to a Virtual Machine on a home server with the following specs: Xeon 50-core CPU, 256GB RAM, 100TB storage, dedicated NVIDIA (12GB) graphics that would be available to this virtual machine? Looking specifically for a nice WebUI package for accessing the LLM. Thanks so much! Keep up the good work!

SushantGargya
Автор

I tried to use what you did there with the snake request, but mine ends rather early, what did you change to get more output from the ai?

adrt
Автор

Could you create an Agent / tool use prompt in your tests?

mattshelley
Автор

How do you not have a link in the description to mosaic but you link to gpt4all?

jasonpenick
Автор

Why would the cutoff date be 2021 for this open-source model?

WilsonSilva