Mistral OCR - Multimodal & Multilingual OCR

Показать описание

In this video, I look at the latest release from Mistral AI, which is their Mistral OCR model. I look at how it works and how it compares to other models, as well as how you can get started using it with code.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
00:17 Other models
00:35 Mistral OCR Blog
05:45 Mistral OCR Demo
13:47 Mistral OCR Batch inference

Рекомендации по теме

Комментарии

I follow you just because of your honest review without any false hypes. Love your content man!

chhabiacharya

Nice, it's something we can use at our company, probably also for some personal use cases... Thanks for covering this. Company processes a ton of legal Arabic docs and has been blocked on this issue for sometime due to quality issues.

akayx

Extra useful. I think OCR is an area that still doesn't have a clear winner, especially in more obscure languages. And I agree that it seems like a feasible strategy for smaller companies to develop nifty tools like this.

However, if I were to use it for my hypothetical company, I'd struggle with my security concerns. If I understand this correctly, you'll be sharing your vital data with one more AI company through their API. I'd probably use a local LLM for data analysis, which makes it really hard to concede to a Mistral API just for the sake of OCR ... unless that's my only option.

Dr.UldenWascht

clear, straight to the point, very cool video

brunosavoca

I have tryed to upload manual bills into mistral Le Chat and Chat GPT, difference was obvious in terms of performance. Le Chat was not able to extract complicated type of writtings. Hope this new model will go further.

Bellevillezogataga

It’s very good for well structured pdfs and images, which we have at work, for another (large) batch of more unstructured/hand written and more messy content, still better to do computer vision with google ocr (with precise bboxes).

alchemication

Any testing with handwriting? A lot of OCR use cases end up processing documents that are a mix of print and someone "annotating" with pen afterward.

Aberger

Great vid and good to see this tech improve. Now, how to get the OCR data into a multimodal vector embedding. That's the next missing piece for me. VoyageAI maybe. The base_64 could be used in a multimodal embedding, maybe?

BrandonFoltz

To try it out you can upload an image through Le Chat, without going through the API (or having to set up payments).

So I have a scan of an old Polish book in which every line is missing half of the last word because the scan was cut off. I told it to try to guess what the last word was (something which a native speaker is able to do with high accuracy). Unfortunately, it failed in most of those guess-the-word cases, but it did a pretty good job for such words that were fully visible - without any prepping up of the scanned low-contrast text on my part. It made a couple errors for the visible words, too, but not any more than I've seen other OCR packages do in the past for random prose. Trying to do it a second pass to e.g. fix the incoherent words it produced, just by proof-reading the text, did not bring any improvement.

clray

This would be good for building out a graphrag type system. Get this model to extract all the documents and then send it to another cheap model that can start doing more processing before throwing everything into a graph database. Especially with it only being a dollar per thousand pages it would be dirt cheap to have this working along side something like Gemini

pin

I wonder if mistral ocr will return coordinates like Azure document intelligence. This is important for highlighting the original texts for some applications for human review

wangbei

I just got Mistral OCR working, and... it has fallen over on my very first attempt. I used the cover of the Local Hero soundtrack CD as a simple test image (large Times New Roman on a plain background), and I got back "LOCAL HIERO" in response. Oh dear, first impressions are not good. I've been doing lots of OCR in the last fortnight and found Claude 3.5 and Claude 3.7 to be very good. Claude 3.7 in particular was 100% accurate on documents I gave it and could generate up to 12-page long Markdown documents in one go. Both Claude models have even been able to incorporate handwritten annotations on a typed document. Just mentioning if it helps others.

KohanIkin

Can you get accurate bounding boxes where each piece of text is found in the image? All the models I've tried so far struggle with that, but it is a required feature for e.g. screen recognition and agents that are suppoed to operate UIs for you.

clray

Do you think this model will do well with hand written data extraction?

okwudex

Comparison with AllenAI OlmOCR open model and Gemini 2.0 Flash would be interesting-there are competing claims and OlmOCR requires a bit more tooling locally

jonchun

The Colab link in the description is for YT - Phi-4 Multimodal notebook not Mistral OCR notebook

MichealAngeloArts

Out of 13 typed lone Thai words, it seems to miss small character differences like ร vs ว and บ vs ม. I wonder about longer text to give it more context.

pawinpawin

Does it gives x, y cordinate style details output with text like as azure ocr?

shriradhe

Arabic has been the biggest challenge for OCR software over the past years. I am not a tech savvy could you tell us how to use it.

zidane

Unless it's open source, I'm not really interested. Gemini 2.0 has never failed me for this

equious

Mistral OCR - Multimodal & Multilingual OCR

Mistral OCR - Multimodal & Multilingual OCR

Mistral OCR - The World’s Best Document Understanding Model?

Mistral OCR ideal for multimodal RAG Fast Cheap Accurate OCR

How to Build & Deploy Your Own Mistral OCR App for ANY Complex Doc (PDF or Image) #ai

Mistral OCR : This is the FREE, EASY & SOTA MODEL for TEXT EXTRACTION!

Mistral OCR Explained | Best AI Model for Document Understanding | 265

Novo Mistral OCR é a MELHOR IA do mundo para PDFs

LlamaOCR - Building your Own Private OCR System

Easy demo of Mistral AI new OCR feature #GenAI #trending

Mistral AI just made PDF parsing insanely fast?!

Pixtral-12B 👀: Mistral AI's First Multi-Modal VLLM is HERE!

Pixtral is REALLY Good - Open-Source Vision Model

Mistral cracks AI document analysis

Install Pixtral 12B Locally - Mistral's First Multi-modal Model

Mistral AI's NEW OCR Just DESTROYED Microsoft, Google, AND OpenAI!

Mistral OCR: La Revolución en la Comprensión de Documentos Complejos

Mistral Pixtral Large Released! Did it pass the test ?

This new AI is powerful and uncensored… Let’s run it

The First Free AI Agent Builder is Here and it's Powerful

Multimodal LLM: Microsoft's new KOSMOS-2.5 for Image Text

Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

What is Retrieval-Augmented Generation (RAG)?

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

I Ran Advanced LLMs on the Raspberry Pi 5!