Pixtral (Fully Tested): Mistral's NEW VISION LLM is Finally Here & Beats Qwen-2 VL?

Показать описание

Join this channel to get access to perks:

In this video, I'll be fully testing the New Pixtral Vision Model by Mistral that's based on the Opensource Mistral Nemo 12B model. We'll check if it's really good. I'll also be trying to find out if it can really beat Llama-3.1, Claude 3.5 Sonnet, GPT-4O, DeepSeek & Qwen-2 in vision and language tests. Pixtral Vision model is fully opensource and can be used for FREE. Pixtral Vision is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔥 Mistral's Pixtral: The new multimodal model can now process both text and images, bringing advanced AI capabilities from Mistral to the forefront!

👀 Built on Nemo 12b: Pixtral is based on the powerful Mistral Nemo 12b model, but now with added image recognition features—ideal for advanced AI tasks!

📊 Controversial benchmarks: Mistral’s benchmarks have raised eyebrows again, with comparisons to Qwen2 Vision showing signs of data manipulation—learn more in the video!

🚀 128k context & Apache 2.0: Pixtral boasts a massive 128k context capacity, ensuring smoother long-form content generation with a reliable Apache 2.0 license.

🔧 Local hosting made easy: Learn how to set up and run Pixtral locally using VLLM commands for fast deployment and OpenAI compatibility—perfect for AI developers!

✅ Image-to-code tests: Watch Pixtral tackle real-world AI image-to-code tasks, from generating Python programs to creating HTML/CSS interfaces—find out how it compares to Qwen2 VL!

💡 AI humor struggles: While great at vision tasks, Pixtral still stumbles on understanding humor and memes—will Qwen2 VL outperform?

----
Timestamps:

00:00 - Introduction
00:14 - About Pixtral
01:17 - Benchmarks
02:52 - Testing
06:06 - Conclusion
07:41 - Ending

Рекомендации по теме

Комментарии

Can you please add handwritten text to OCR for testing Vision Language models? Most of the vision model fails to correctly recognize the handwritten texts correctly. Only the claude models were very good at this previously, GPTs and gemini are also getting good at this, though they were not that good previously.
But I never saw any opensource vision model to get a page of handwritten text accurate yet.

darkreader

Great short videos, showing the essentials 😊

techblock

Thanks AiCodeKing for another great video. Could you kindly tell me what program you are using to run the Pixtral model please? I was thinking Stable Diffusion? Cheers!!

IR

You should add a circuit diagram interpretation question to the benchmark! I haven't gotten any model to pass. I think I left a very long comment detailing a hard problem I've been asking vision models on another one of your videos but in short, asking it to explain the conponents connected to a specific pin in a large-ish circuit diagram seems to really trip them up - possibly because they need to trace connections across a large area and determine whats connected or not. And when it gets close, it still misidentfies components like confusing capacitors and resistors.

StephenSmith

Could you rephrase your Ground Allspice question in the future? Most models have a "yes" bias. So if you ask them "Does X exist in this image", and X is at all plausible in the context, it will often answer Yes regardless of if it's there or not. A question like "Which of the following spices exist in this image : Maggi, Allspice or White Paprika ?" would tell us much more about the capabilities of the model.
I personally have an entire part of my training set set aside to fixing these "fallacies", it's that much of an issue -_-

NevelWong

Can you help me know how the frontend was integrated?

titansvasto

Is there some controversy around how the models are tested that explains the discrepancy at 2:00

Keksent

on what system do you run this?
12B model propably wont work good on an M1 Macbook Pro with 16GB or?

medusasound

Replica UI output missing out the entire navigation/history pane on the left. Not a pass to me..

ywlow

Hi could you please let me know to use step by step I am new to this how can I get from hugging faces and use

bhavanishankar

Compare Qwen2 VL with MiniCPM-V 2.6. It is also available in Ollama.

DhruvJoshiDJ

On the task with the food label, why not ask what the percentage of fat is requires external knowledge of how to figure that out

chasisaac

i love the soft slow talking. im getting sick of those screaming youtubers

marcusdelictus

Pixtral (Fully Tested): Mistral's NEW VISION LLM is Finally Here & Beats Qwen-2 VL?

Pixtral (Fully Tested): Mistral's NEW VISION LLM is Finally Here & Beats Qwen-2 VL?

Ministral (Fully Tested) : This NEW Mistral Model is the Llama-3.1 REPLACEMENT! (Good at Coding!)

Pixtral is REALLY Good - Open-Source Vision Model

Mistral Pixtral Large LLM REALLY Worth the Hype?

Pixtral-12B 👀: Mistral AI's First Multi-Modal VLLM is HERE!

Mistral Small-2 (Fully Tested) : This NEW SMALL Model is GREAT! (w/ Free API & Beats Llama-3.1)

NEW VLM: PIXTRAL 124B 24.11 - Better Than Sonnet?

Mistral Vision Upgraded!

🇫🇷 Mistral AI's NEW 22B Coding Model with Code Inpainting 🎨 Beats DeepSeekCoder 33B!

Mistral's FREE Coding Canvas, Search & Image Gen: This BEATS Claude & ChatGPT for FULLY...

Pixtral Large AI SEO is INSANE (FREE!) 🤯

🔥Benchmarking Mistral AI’s New VLM Pixtral 12B

Mistral Codestral Released! Did it Pass Coding Test & Best Coder?

PIXTRAL AI Model: The Game-Changer Outsmarting OpenAI & Google

Mistral AI's Pixtral 12B: The Open-Source Multimodal Model

Pixtral 12B (Full Test): Can It Crush Qwen-2 VL as a Vision LLM?

Install Pixtral 12B Locally - Mistral's First Multi-modal Model

Cette nouvelle IA est une PÉPITE (Tout ChatGPT en GRATUIT)

Pixtral 12B Model Review: Great for Images, Not So Much for Multilingual

How-To Use Free Mistral API for Text and Vision Models

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

NEW Llama 3.2 11B vs 90B VISION (Pixtral 12B, GPT4o)

Cline + Aider + Mistral FREE API : This is the BEST FREE WAY to do AI CODING! (Beats Gemini!)

Ollama's Newest Release and Model Breakdown