Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

Показать описание

The Paddle OCR project contains many OCR deep learning models, going from text detection, text recognition, text angle detection and table layout. In this course, we shall make use of already pretrained text detection and text recognition models to extract text found tables in a PDF. From this extracted text, we shall reconstruct the table, based on the text positions.
Enjoy!!!

Hi,
You can use this Link to access our premium courses. You'll be able to build and deploy more than 20 different AI projects.

[Please check your mail 5 minutes after requesting access ]
Colab Notebook: [Please Check Your Mail inbox and spam 5 Minutes after Demanding Access]

Check out:

Connect with us here:

Neuralearn

Рекомендации по теме

Комментарии

Impressive content for Deep Learning OCR! Many thanks!

JujutsuMan

Thank you very much for this! Very insightful!

niroshiniedayaratne

impressive, struggling right now for my little side project using ocr, u helped a lot man, appreciate it

tkcdlyo

brilliant work!!, I would like to thank you for giving me access to notebook.
keep going broo 💙💙

mohamedmagdy

thank you man the best who explain what it is actually happening thank you so much

toto

Broo, this is awesome, thank you very much!!!

Jean-nfyh

Hi, I've followed your procedure as is but I'm getting "ValueError: Can't convert Python sequence with mixed types to Tensor." on the Non-Max Suppression portion. Can you tell me what might be causing that please?

moez.mazhar

Hi, you have done a phenomenol job, by explaining PaddleOCR in detail. Can you please let me know if we can do the training of PaddleOCR on custom datasets for extracting data from tables of different length in pdfs or images.

vishaldas

Well that is a very simple and readable table, it's easy enough to do it with basic if logic....but try a no border, very near to border content, on a scanned image of a table

christianrazvan

Hi, Neuralearn, Thanks for creating great tutorial. Its very useful. Can you please provide notebook access ?

puneetbansal

how i can fix this error "ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory" ?
caused by the line of code "import layoutparser as lp"

malakkhiari

Thank you for the tutorial, I have requested the notebook access

leonc

hello do you have any idea about packaging paddle ocr. Im trying to make a exe of my code but i keep facing errors. anyhelp would be helpful

SiddheshBalashetwar

Great.
Is it possible to use this model for matrix recognition ? how many rows and columns, elements of matrix ?

cissemy

Hi Thank you for this, can youj please help me with the notebook access please, also can you please help me understand will I be able to cover most of the table formats through this?

emailvarun

Hi, I'm getting this error - (External) CUDA error(100), no CUDA-capable device is detected.
[Hint: 'cudaErrorNoDevice'. This indicates that no CUDA-capable devices were detected by the installed CUDA driver. ] (at
Can you help me out w this please?

aishwaryadinesh

Hello neuralearn, thanks for your great tutorial.
Could you please proivide notebook access

ajithn

Hi, Neuralearn, Thanks for creating a very useful tutorial. Can you please provide notebook access for my study?

kenjeroldarellano

Hello, I'm facing trouble when there are multiple lines within the same row, it is considering them as new rows.. how do i fix this?. Thank you!

PurushothamReddy-ffvp

What if it does detect the table as table but as figure or text ?

brmaaouia

Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

How to Extract Tables from PDF using Python

Extract Tables from PDFs

Extract All the Tables From PDF in 3 minutes With Python

Extract text, links, images, tables from Pdf with Python | PyMuPDF, PyPdf, PdfPlumber tutorial

Best Way to Extract Tables from PDF with LLMs

Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2

How to Extract Tables/Charts from a PDF file in any Computer ?

Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

Extract and Visualize Data from PDF Tables with PDFplumber in Python

PDF Extractor SDK - C# - Extract Table Structure

Extract Tables from PDFs & Images - Convert PDF to Excel using Camelot in Python

Extract Data from PDFs Easily & Quickly (table form/image/text/pages)

How to Extract Table Data from PDF to Excel

Extract Tables Containing Text from PDF using PDF.co and Zapier

How to 'automatically' extract data from a messy PDF table to Excel

Extract Tables from PDF and Image Documents Automatically Using Advanced AI | Deep Learning

Automate extract all PDF tables separately into different Excel sheets with column names using PAD

How to Edit and Extract Tables from PDF Using Foxit | From PDF to Excel

Extract Table Info From PDF & Summarise It Using Llama3 via Ollama | LangChain

Extract Specific Data from PDF to Excel

How to Extract Tables from PDFs Using Python: Step-by-Step Tutorial | Learnerea

Use List of PDF Table Info to extract PDF table and column name into EXCEL - Power Automate Desktop

Extract Tables from a PDF using Power BI

How to extract table from PDF using Python OpenCV