LlamaOCR - Building your Own Private OCR System

Показать описание

The video demonstrates LlamaOCR, an OCR tool leveraging the Llama 3.2 visual model. It focuses on the tool's ability to convert images and scanned documents into structured Markdown, preserving the original formatting of elements like tables, lists, and spreadsheets. The video covers practical usage examples, offering tutorials and code snippets in both JavaScript and Python within a Colab environment.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

⏱️Time Stamps:
00:00 LlamaOCR Project
00:56 Demo Using their Site
02:43 Colab Demo
04:40 Together.AI Docs
06:06 Pricing
09:16 Python OCR Version
11:20 Thai OCR Project
16:30 Patreon

Рекомендации по теме

Комментарии

Vision models be mysterious wizardry. They make me the most excited out of all bc I firmly believe a future conscious 'model' could be iterated from vision models (not new, but not mentioned enough i think). If there were a way to keep the vision model exclusively in virtual space... a whole wealth of experimentation could open up with visualizing things, it might even turn hallucinations into useful features.

Charles-Darwin

I'd be super interested in knowing the process of training on object detection / region of interest. Anyone have pointers where I can read up on this?

victorkarlsson

Doing simple OCR via LLM is shut fly using bazooka.

Piotr_Sikora

Nice! Any difference with docling or llamaparse solutions?

WhyitHappens-

when you are integrating it with Agents ?

itsbhardwaj

did someone try to integrate it in n8n?

staticalmo

how to get rid of hallucination especially in this kind of project? i json a good ouptu format?

What are the benefits of using a giant LLM for something as simple as OCR?

el_arte

This seems objectively bad at the job.

The Walmart receipt just flat out ignored the whole central column of numbers.

Reordering sections of text...

Not seeing its usefulness at this level of error and garbling things.

What about a mixed tesseract + LLM to correct it?

alogghe

Qwen vl is better than llama 3.2 on ocr

viky

What a weird wrapper project. Just use llama vision and say :

`Convert the provided image into Markdown format. Ensure that all content from the page is included, such as headers, footers, subtexts, images (with alt text if possible), tables, and any other elements.

Requirements:

- Output Only Markdown: Return solely the Markdown content without any additional explanations or comments.
- No Delimiters: Do not use code fences or delimiters like \`\`\`markdown.
- Complete Content: Do not omit any part of the page, including headers, footers, and subtext.
`;

cause literally that's all this project is doing.

orangehatmusic

There is Tika for that. Stop showing AI as the address to solved problems

greendsnow

LlamaOCR - Building your Own Private OCR System

LlamaOCR - Building your Own Private OCR System

Build Anything with Llama 3 Agents, Here’s How

Install Llama-OCR Locally - Document to Markdown OCR Library with Llama 3.2 Vision

Llama | ChatGPT as OCR Vision document AI

Meta's New Llama 3.2 is here - Run it Privately on your Computer

Correctly Install and Use Llama 3.1 LLM in Python on a Local Computer - Complete Tutorial

Invoice Extraction Bot - Langchain || LLAMA 2 || OpenAI

Fine-tune LiLT model for Information extraction from Image and PDF documents | UBIAI | Train LiLT |

Train & Serve Custom Multi-modal Models - IDEFICS 2 + LLaVA Llama 3

Grab text from dead PDF or picture with Google Lens on PC

(AI Tinkerers Ottawa) Gemini models, agentic frameworks, and AI scene in Singapore w/ Sam Witteveen

Décryptage d'une évaluation de thèse avec Scispace : Les secrets de l'IA dévoilé

PM Training Online 2023 - Entrena tu GPT para Automatizar Informes de Seguimiento de Proyectos

Acceso a Datos Offchain con Chainlink Functions | Chainlink Bootcamp - Día 9

Power Automate Desktop: Usar OCR para leer números de folio en PDF

Orientaciones para incorporar la accesibilidad a la información en mis clases en línea

Multimedia File Formats | Chapter 1 - Multimedia |1.4 | XII STD CA #TNSCERT|

Clase #2: Informática IV

Sistema de educación especial en Estados Unidos PODCAST

¿De QUÉ HABLAMOS cuando HABLAMOS de INTELIGENCIA ARTIFICIAL? 🤖