[23] Use Python to OCR a scanned PDF for accounting

Показать описание

Рекомендации по теме

Комментарии

Thanks so much for sharing colabs, trying to get various libraries to work in Windows with Python has been a pain in the neck so this is great.

CaspersCuts

Thanks for your guide. I always learn a lot from you!

ZNguyenThienZ

its as if you read my mind ! I was working on a very similar module in a project, so its very very useful & informative !! Thanks a lot ✌

saraibrahim

Excellent video as usual !!!! Thanks !!!

welbsantos

keep up the good work!!! Such useful material.

lahirulowe

Great video. I've been looking through your videos for a certain problem I'm having and I'm not seeing a solution. I have a PDF where the description is a few sentences long, and therefore goes into multiple lines, so when I try and read it using Tabula (I know you don't use the Tabula module) I get a dataframe with a whole lot of NaN values, for every row where there's text from the description but no corresponding price or item number. Any advice? To clarify, the text from the description will be present in the description column (for just that line). There's just a whole lot of extra NaN values for the other columns in the extra rows. I'd like for the whole description to take up one single cell as opposed to several.

liammckenna

Hey, after I run
os.system(f'ocrmypdf {invoice_pdf} output.pdf'), I getting an output of 256 instead of 0. Can you please help me out.

likuduu

Cool tutorial! How do we tackle rotated pdfs though?

SPRytte

Thank you for creating this tutorial, great help. How can we convert the image in the pdf to a dataframe. I work in finance and I get scanned pdf's file. If I can convert to pandas dataframe, will save soo much time.

jask

To add the marks in the front sheet of an answer sheet by recognizing each digit from its position..can we use ocr and how it can be implemented??plz help

farseenbasheer

Thanks for sharing. May I know what the steps are to download this package on Windows 10? Thanks

julianzhai

Could you be able to extract hand written text from forms ???

douradavidboge

I get an error code 256 when I run cell block [8] os.system(....)

I looked on stack overflow but didn't find a solution that worked, and there is no error code 256 in the official documentation (they end at 130)
I am running things in Google colab.
I tried to import the same invoice.pdf over from Drive, but I still got the same error code. I also was not able to get "!ocrmypdf invoice_pdf output.pdf" to work (alternate syntax)

Any help is appreciated

jaquielajoie

Is there any way i could make this work in jupyter notebook on a windows PC?

rossgellar

But I want to try for mutiple pdf and all are different format I want something that it only extract table from that because it is like a crystal report

shreymishra

Just what i needed, i wonder if there's a simpler way to first check if pdf is scanned and based on that apply ocr or not.

SenzoDlomo

Hi,
Thank for this video! Watching more contents about this lol
I need support in one line of this code... Can u help me?
I use the pdf file stored on my local machine (replaced it {file_name} instead of {invoice_pdf}), but it return the follows error:
*ocrmypdf: error: unrecognized arguments: output.pdf*
*FileNotFoundError: [Errno 2] No such file or directory: 'output.pdf'*

guilhermemendes

Hello sir, absolutely amazing video. I have a question for you if you think this is possible. Let's say I downloaded Amazon's form 10k in pdf form. Would it be possible to extract every single number from the document input them into Excel rows, and add a unique Id to every extracted number directly into the pdf document

carlscholl

I don't know what I am doing incorrectly ...but I do not get 0 at line 8. I have tried single and double quotes..no bueno. Any Ideas?

livinginlouisvillewithtama

Thanks for the video. But when I run the part '!ocrmypdf 2UJgUoO output.pdf' I get the message "FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'". I checked that everything is the video is installed and I also tried it again after installing tesseract-ocr and pytesseract but keep on getting the same error.

Selim

[23] Use Python to OCR a scanned PDF for accounting

[23] Use Python to OCR a scanned PDF for accounting

Image to Text with Python - pytesseract 💥 👍 2022

Optical Character Recognition (OCR) in Python using keras-ocr

183 - OCR in python using keras-ocr

Python Extract Text from Scanned PDF | Python Extract Text from Image | Python Tesseract OCR Setup

Extract URLs from Images using OCR in Python

OCR Text recognition with Python and API (ocr.space)

How to use Bounding Boxes with OpenCV (OCR in Python Tutorials 03.02)

Text Detection with OpenCV in Python | OCR using Tesseract (2020)

Python OpenCV OCR Tutorial | Optical Character Recognition | Python tutorial for beginners

Rip out Drug Labels using Deep Learning with PaddleOCR & Python

How to make OCR PDFs on Windows using Tesseract

Easy OCR library | in Python | in 2023 | using Google Colab | for license plates recognition.

Extracting Text from Images | Optical Character Recognition | OCR

Text recognition (OCR) with Tesseract and Python

Realtime Text Detection in Images using Tesseract | OpenCV | Python | Tutorial for beginners

TensorFlow in 100 Seconds

OCR Your Receipts for Free - Read Text and Line Items from Receipts

Document Scanner and Text Detection | Recognition OCR using OpenCV and Python - Source Code

Bypass normal or image captcha using python selenium

Step-by-Step Handwriting Recognition Tutorial Using TensorFlow

How to create a ocr service with easyocr and flask | in 10 minutes

[15] Use Python to extract invoice lines from a semistructured PDF AP Report

Como transformar imagem em texto usando OCR em Python com OpenCV , Tesseract reconhecendo caracteres