Pixtral is REALLY Good - Open-Source Vision Model

preview_player
Показать описание
Let's test Pixtral, the newest vision model from MistralAI.

Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗

Media/Sponsorship Inquiries ✅
Рекомендации по теме
Комментарии
Автор

We need AI doctors for everyone on earth

LoveLifePD
Автор

Show it a Tech Sheet on a simple divise, like a Dryer, ask it what it is. ask it to outline the circuit for the heater. Give it a symptom, like, the dryer will not start. then ask it to reason out the step by step trouble shooting procedure using the wiring diagram and a multimeter with live voltage.

YOGiiZA
Автор

I'd be reallyyy interested to see more tests on how well it handles positionality, since vision models have tended to struggle with that. As I understand it, that's one of the biggest barriers to having models operate UIs for us

justtiredthings
Автор

"Mistral" is (English/American-ised) pronounced with an "el" sound. Pixtral would be similar. So "Pic-strel" would be appropriate. However the French pronunciation is with an "all" sound. Since mistral is a French word for a cold wind that blows across France, I would go with that for correctness. It's actually more like "me-strall", so in this case "pic-strall" should be correct.

At any rate, I look forward to a mixture of agents/experts scenario where pixtral gets mixed in with other low/mid weight models for fast responses.

Feynt
Автор

Matt, you made a point regarding decent smaller models used for specialized tasks. That comment reminds me of Agents obviously, each seemingly with their own specialized model for tasks and a facilitator to delegate to agents. I think most want to see smaller and smaller open source models getting better and better on benchmarks.

Transforming-AI
Автор

I’d love to see some examples of problems where two or more models are used together. Maybe Pixtral describes a chess board, then an economical model like Llama translates that into standard chess notation, and then 01 does the deep thinking to come up with the next move. (I know 01 probably doesn’t need the help from Llama in this scenario, but maybe doing it this was would be less expensive than having 01 do all the work).

ChrisAdaline
Автор

I think you should try giving it a photo of the word "Strawberry" and then ask it to tell you how many letter r's are in the word.
Maybe vision is all we needed to solve the disconnect from tokenization?

sleepingbag
Автор

Where is GPT-4o live screenshare option?

DK.CodeVenom
Автор

Great video, Matthew! Just a suggestion for testing vision models based on what we do internally. We feed images from the James Webb telescope into the model and ask it to identify what we can already see. One thing to keep in mind is that if something's tough for you to spot, the AI will likely struggle too. Vision models are great at 'seeing outside the box, ' but sometimes miss what's right in front of them. Hope that makes sense!

idontexist-satoshi
Автор

When you next test vision models you should try giving it architectural floor plans to describe, and also correlate various drawings like a perspective rendering or photo vs a floor plan (of the same building), which requires a lot of visual understanding. I did that with Claude 3.5 and it was extremely impressive.

hypertectonics
Автор

I foresee a time period where the AI makes captchas for humans to keep us from meddling in important things

"Oh, you want to look at the code used to calculate your state governance protocols? Sure, solve this quantum equation in under 3 seconds!"

jeffsmith
Автор

Nonchalantly says Captcha is done. That was good.

opita
Автор

For the bill gates one, you put in an image with "bill gates" in the filename! Doesn't that give the model a huge hint as to the content of the photo?

timtim
Автор

A picture of a spreadsheet with questions about it would be gold for real use cases.

josephflowers
Автор

You should add an OCR test for handwritten text to the image models.

nginnnginn
Автор

The big question for me, is when will Pixtral be available on Ollama, which is my interface of choice... If it will work on Ollama, it opens up a world of possibilities.

whitneydesignlabs
Автор

Do you think an AGI would be basically these specialised use-case LLMs working as agents for a master LLM?

JustaSprigofMint
Автор

I just signed up with Vutr and was wondering if you were going to do any videos on this? Does anyone know of training for this? I want to run my Lama on it.

lordjamescbeeson
Автор

I am trying LM studio but the model that is available is text-only, is there a way to get the vision model loaded into LM studio?

Pauluz_The_Web_Gnome
Автор

7:50 my iPhone could not read that this is QR code.

JoelSapp