Extract text from Any PDF File (even scanned ones) using OCR pytesseract in 3 SIMPLE STEPS!

preview_player
Показать описание
This video answer a general problem most people face when it comes to extract text from PDF Files. Different technics exist, but this Video guides you through the one using Pytesseract in 3 main steps:
- Convert pdf into images
- Get text from images
- Combine the previous two technics

Feel free to Like, Share, Subscribe, Comment & Provide Video Ideas to

Link to the code:
Рекомендации по теме
Комментарии
Автор

HI! Thanks a lot for the extraction, i want to convert a scanned pdf to editable word doc.In the above video the accuracy is 97% only

swetharamshetty
Автор

Thanks a lot. The code works smoothly. Nice.
Can you find, extract a table from a scanned PDF and save it into a dataframe ?
Thx

dyzy
Автор

how to add other language in the code ? Thank you for the great explanation 👏🏼

sarasa
Автор

Hi, can you modify the code that way, that the new file ext to the text contains the orginal page settings and structur of the orginal pdf. Like the text is in the same place where it was in the orginal pdf

zsuzsannakristof
Автор

Thanks a lot. The code works. I want to get paragraphs and titles without any tables or figures. How can I solve this?

kibtiachowdhury
Автор

PDFPageCountError: Unable to get page count.I/O Error: Couldn't open file Cry Image.pdf': No error.

mohammednisar
Автор

Hi, came across ur video after multiple failed attempts of converting my file. Can I somehow ignore the Headers and footers. Also, I have bulletins in my documents and some of the bulletins are on the next page; how do I take care of that?

Thanks in advance!!

jardanijonovich
Автор

what version used in this, when i use it gives me poppler path error and tesseract install in pc and path settting

sivachaitanya
Автор

Sir can you make a video on that like we have to extract the paragraph under the title from pdf.

ravimakwana
Автор

Unable to get page count. Is poppler installed and in PATH? the errror is comming

jeyapauldavid
Автор

does this work on folder with multiple PDF files?

cherlynang
Автор

Can this code work with pdf in url format? If so, kindly help lines of code to handle such

chepkoechfancy
Автор

i'm getting an error, Output exceeds the size limit. Open the full output data in a text editor

avinashkrishna
Автор

Sir super but one question.. Multiple PDFs how to extract text from group or many PDFs???

kiranvanukuri
Автор

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? why am I getting this error

shainialakumbura
Автор

: Failed to activate VS environment: Could not find C:\Program Files (x86)\Microsoft Visual Studio\Installer\vswhere.exe

any solution to the above error please telll

avbendre
Автор

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

vishalgarg
Автор

هل يمكن مثال على استعمال الكود
واين يوضع
وكيف اشغله

QorQar
Автор

It's usefull, but my pc crash by out of memory or by cpu temperatur highter. ^^

TiriAlain