Pixtral (Fully Tested): Mistral's NEW VISION LLM is Finally Here & Beats Qwen-2 VL?

preview_player
Показать описание
Join this channel to get access to perks:

In this video, I'll be fully testing the New Pixtral Vision Model by Mistral that's based on the Opensource Mistral Nemo 12B model. We'll check if it's really good. I'll also be trying to find out if it can really beat Llama-3.1, Claude 3.5 Sonnet, GPT-4O, DeepSeek & Qwen-2 in vision and language tests. Pixtral Vision model is fully opensource and can be used for FREE. Pixtral Vision is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔥 Mistral's Pixtral: The new multimodal model can now process both text and images, bringing advanced AI capabilities from Mistral to the forefront!

👀 Built on Nemo 12b: Pixtral is based on the powerful Mistral Nemo 12b model, but now with added image recognition features—ideal for advanced AI tasks!

📊 Controversial benchmarks: Mistral’s benchmarks have raised eyebrows again, with comparisons to Qwen2 Vision showing signs of data manipulation—learn more in the video!

🚀 128k context & Apache 2.0: Pixtral boasts a massive 128k context capacity, ensuring smoother long-form content generation with a reliable Apache 2.0 license.

🔧 Local hosting made easy: Learn how to set up and run Pixtral locally using VLLM commands for fast deployment and OpenAI compatibility—perfect for AI developers!

✅ Image-to-code tests: Watch Pixtral tackle real-world AI image-to-code tasks, from generating Python programs to creating HTML/CSS interfaces—find out how it compares to Qwen2 VL!

💡 AI humor struggles: While great at vision tasks, Pixtral still stumbles on understanding humor and memes—will Qwen2 VL outperform?

----
Timestamps:

00:00 - Introduction
00:14 - About Pixtral
01:17 - Benchmarks
02:52 - Testing
06:06 - Conclusion
07:41 - Ending
Рекомендации по теме
Комментарии
Автор

Can you please add handwritten text to OCR for testing Vision Language models? Most of the vision model fails to correctly recognize the handwritten texts correctly. Only the claude models were very good at this previously, GPTs and gemini are also getting good at this, though they were not that good previously.
But I never saw any opensource vision model to get a page of handwritten text accurate yet.

darkreader
Автор

Great short videos, showing the essentials 😊

techblock
Автор

Thanks AiCodeKing for another great video. Could you kindly tell me what program you are using to run the Pixtral model please? I was thinking Stable Diffusion? Cheers!!

IR
Автор

You should add a circuit diagram interpretation question to the benchmark! I haven't gotten any model to pass. I think I left a very long comment detailing a hard problem I've been asking vision models on another one of your videos but in short, asking it to explain the conponents connected to a specific pin in a large-ish circuit diagram seems to really trip them up - possibly because they need to trace connections across a large area and determine whats connected or not. And when it gets close, it still misidentfies components like confusing capacitors and resistors.

StephenSmith
Автор

Could you rephrase your Ground Allspice question in the future? Most models have a "yes" bias. So if you ask them "Does X exist in this image", and X is at all plausible in the context, it will often answer Yes regardless of if it's there or not. A question like "Which of the following spices exist in this image : Maggi, Allspice or White Paprika ?" would tell us much more about the capabilities of the model.
I personally have an entire part of my training set set aside to fixing these "fallacies", it's that much of an issue -_-

NevelWong
Автор

Can you help me know how the frontend was integrated?

titansvasto
Автор

Is there some controversy around how the models are tested that explains the discrepancy at 2:00

Keksent
Автор

on what system do you run this?
12B model propably wont work good on an M1 Macbook Pro with 16GB or?

medusasound
Автор

Replica UI output missing out the entire navigation/history pane on the left. Not a pass to me..

ywlow
Автор

Hi could you please let me know to use step by step I am new to this how can I get from hugging faces and use

bhavanishankar
Автор

Compare Qwen2 VL with MiniCPM-V 2.6. It is also available in Ollama.

DhruvJoshiDJ
Автор

On the task with the food label, why not ask what the percentage of fat is requires external knowledge of how to figure that out

chasisaac
Автор

i love the soft slow talking. im getting sick of those screaming youtubers

marcusdelictus