LlamaOCR - Building your Own Private OCR System

preview_player
Показать описание
The video demonstrates LlamaOCR, an OCR tool leveraging the Llama 3.2 visual model. It focuses on the tool's ability to convert images and scanned documents into structured Markdown, preserving the original formatting of elements like tables, lists, and spreadsheets. The video covers practical usage examples, offering tutorials and code snippets in both JavaScript and Python within a Colab environment.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

⏱️Time Stamps:
00:00 LlamaOCR Project
00:56 Demo Using their Site
02:43 Colab Demo
04:40 Together.AI Docs
06:06 Pricing
09:16 Python OCR Version
11:20 Thai OCR Project
16:30 Patreon
Рекомендации по теме
Комментарии
Автор

Vision models be mysterious wizardry. They make me the most excited out of all bc I firmly believe a future conscious 'model' could be iterated from vision models (not new, but not mentioned enough i think). If there were a way to keep the vision model exclusively in virtual space... a whole wealth of experimentation could open up with visualizing things, it might even turn hallucinations into useful features.

Charles-Darwin
Автор

I'd be super interested in knowing the process of training on object detection / region of interest. Anyone have pointers where I can read up on this?

victorkarlsson
Автор

Doing simple OCR via LLM is shut fly using bazooka.

Piotr_Sikora
Автор

Nice! Any difference with docling or llamaparse solutions?

WhyitHappens-
Автор

when you are integrating it with Agents ?

itsbhardwaj
Автор

did someone try to integrate it in n8n?

staticalmo
Автор

how to get rid of hallucination especially in this kind of project? i json a good ouptu format?

Автор

What are the benefits of using a giant LLM for something as simple as OCR?

el_arte
Автор

This seems objectively bad at the job.

The Walmart receipt just flat out ignored the whole central column of numbers.

Reordering sections of text...

Not seeing its usefulness at this level of error and garbling things.

What about a mixed tesseract + LLM to correct it?

alogghe
Автор

Qwen vl is better than llama 3.2 on ocr

viky
Автор

What a weird wrapper project. Just use llama vision and say :

`Convert the provided image into Markdown format. Ensure that all content from the page is included, such as headers, footers, subtexts, images (with alt text if possible), tables, and any other elements.

Requirements:

- Output Only Markdown: Return solely the Markdown content without any additional explanations or comments.
- No Delimiters: Do not use code fences or delimiters like \`\`\`markdown.
- Complete Content: Do not omit any part of the page, including headers, footers, and subtext.
`;

cause literally that's all this project is doing.

orangehatmusic
Автор

There is Tika for that. Stop showing AI as the address to solved problems

greendsnow