Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

preview_player
Показать описание
The Paddle OCR project contains many OCR deep learning models, going from text detection, text recognition, text angle detection and table layout. In this course, we shall make use of already pretrained text detection and text recognition models to extract text found tables in a PDF. From this extracted text, we shall reconstruct the table, based on the text positions.
Enjoy!!!

Hi,
You can use this Link to access our premium courses. You'll be able to build and deploy more than 20 different AI projects.

[Please check your mail 5 minutes after requesting access ]
Colab Notebook: [Please Check Your Mail inbox and spam 5 Minutes after Demanding Access]

Check out:

Connect with us here:
Рекомендации по теме
Комментарии
Автор

Impressive content for Deep Learning OCR! Many thanks!

JujutsuMan
Автор

Thank you very much for this! Very insightful!

niroshiniedayaratne
Автор

impressive, struggling right now for my little side project using ocr, u helped a lot man, appreciate it

tkcdlyo
Автор

brilliant work!!, I would like to thank you for giving me access to notebook.
keep going broo 💙💙

mohamedmagdy
Автор

thank you man the best who explain what it is actually happening thank you so much

toto
Автор

Broo, this is awesome, thank you very much!!!

Jean-nfyh
Автор

Hi, I've followed your procedure as is but I'm getting "ValueError: Can't convert Python sequence with mixed types to Tensor." on the Non-Max Suppression portion. Can you tell me what might be causing that please?

moez.mazhar
Автор

Hi, you have done a phenomenol job, by explaining PaddleOCR in detail. Can you please let me know if we can do the training of PaddleOCR on custom datasets for extracting data from tables of different length in pdfs or images.

vishaldas
Автор

Well that is a very simple and readable table, it's easy enough to do it with basic if logic....but try a no border, very near to border content, on a scanned image of a table

christianrazvan
Автор

Hi, Neuralearn, Thanks for creating great tutorial. Its very useful. Can you please provide notebook access ?

puneetbansal
Автор

how i can fix this error "ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory" ?
caused by the line of code "import layoutparser as lp"

malakkhiari
Автор

Thank you for the tutorial, I have requested the notebook access

leonc
Автор

hello do you have any idea about packaging paddle ocr. Im trying to make a exe of my code but i keep facing errors. anyhelp would be helpful

SiddheshBalashetwar
Автор

Great.
Is it possible to use this model for matrix recognition ? how many rows and columns, elements of matrix ?

cissemy
Автор

Hi Thank you for this, can youj please help me with the notebook access please, also can you please help me understand will I be able to cover most of the table formats through this?

emailvarun
Автор

Hi, I'm getting this error - (External) CUDA error(100), no CUDA-capable device is detected.
[Hint: 'cudaErrorNoDevice'. This indicates that no CUDA-capable devices were detected by the installed CUDA driver. ] (at
Can you help me out w this please?

aishwaryadinesh
Автор

Hello neuralearn, thanks for your great tutorial.
Could you please proivide notebook access

ajithn
Автор

Hi, Neuralearn, Thanks for creating a very useful tutorial. Can you please provide notebook access for my study?

kenjeroldarellano
Автор

Hello, I'm facing trouble when there are multiple lines within the same row, it is considering them as new rows.. how do i fix this?. Thank you!

PurushothamReddy-ffvp
Автор

What if it does detect the table as table but as figure or text ?

brmaaouia