[23] Use Python to OCR a scanned PDF for accounting

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

Thanks so much for sharing colabs, trying to get various libraries to work in Windows with Python has been a pain in the neck so this is great.

CaspersCuts
Автор

Thanks for your guide. I always learn a lot from you!

ZNguyenThienZ
Автор

its as if you read my mind ! I was working on a very similar module in a project, so its very very useful & informative !! Thanks a lot ✌

saraibrahim
Автор

Excellent video as usual !!!! Thanks !!!

welbsantos
Автор

keep up the good work!!! Such useful material.

lahirulowe
Автор

Great video. I've been looking through your videos for a certain problem I'm having and I'm not seeing a solution. I have a PDF where the description is a few sentences long, and therefore goes into multiple lines, so when I try and read it using Tabula (I know you don't use the Tabula module) I get a dataframe with a whole lot of NaN values, for every row where there's text from the description but no corresponding price or item number. Any advice? To clarify, the text from the description will be present in the description column (for just that line). There's just a whole lot of extra NaN values for the other columns in the extra rows. I'd like for the whole description to take up one single cell as opposed to several.

liammckenna
Автор

Hey, after I run
os.system(f'ocrmypdf {invoice_pdf} output.pdf'), I getting an output of 256 instead of 0. Can you please help me out.

likuduu
Автор

Cool tutorial! How do we tackle rotated pdfs though?

SPRytte
Автор

Thank you for creating this tutorial, great help. How can we convert the image in the pdf to a dataframe. I work in finance and I get scanned pdf's file. If I can convert to pandas dataframe, will save soo much time.

jask
Автор

To add the marks in the front sheet of an answer sheet by recognizing each digit from its position..can we use ocr and how it can be implemented??plz help

farseenbasheer
Автор

Thanks for sharing. May I know what the steps are to download this package on Windows 10? Thanks

julianzhai
Автор

Could you be able to extract hand written text from forms ???

douradavidboge
Автор

I get an error code 256 when I run cell block [8] os.system(....)

I looked on stack overflow but didn't find a solution that worked, and there is no error code 256 in the official documentation (they end at 130)
I am running things in Google colab.
I tried to import the same invoice.pdf over from Drive, but I still got the same error code. I also was not able to get "!ocrmypdf invoice_pdf output.pdf" to work (alternate syntax)

Any help is appreciated

jaquielajoie
Автор

Is there any way i could make this work in jupyter notebook on a windows PC?

rossgellar
Автор

But I want to try for mutiple pdf and all are different format I want something that it only extract table from that because it is like a crystal report

shreymishra
Автор

Just what i needed, i wonder if there's a simpler way to first check if pdf is scanned and based on that apply ocr or not.

SenzoDlomo
Автор

Hi,
Thank for this video! Watching more contents about this lol
I need support in one line of this code... Can u help me?
I use the pdf file stored on my local machine (replaced it {file_name} instead of {invoice_pdf}), but it return the follows error:
*ocrmypdf: error: unrecognized arguments: output.pdf*
*FileNotFoundError: [Errno 2] No such file or directory: 'output.pdf'*

guilhermemendes
Автор

Hello sir, absolutely amazing video. I have a question for you if you think this is possible. Let's say I downloaded Amazon's form 10k in pdf form. Would it be possible to extract every single number from the document input them into Excel rows, and add a unique Id to every extracted number directly into the pdf document

carlscholl
Автор

I don't know what I am doing incorrectly ...but I do not get 0 at line 8. I have tried single and double quotes..no bueno. Any Ideas?

livinginlouisvillewithtama
Автор

Thanks for the video. But when I run the part '!ocrmypdf 2UJgUoO output.pdf' I get the message "FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'". I checked that everything is the video is installed and I also tried it again after installing tesseract-ocr and pytesseract but keep on getting the same error.

Selim