How to extract text from pdf using python | FinTechChef | OCR using python

preview_player
Показать описание
In this video you will see how to extract text from pdf using python. There are many powerful modules for extracting text from pdf and few of them are tesseract, textract, Camelot, pyPDF2, tabula.
But, we are going to use "textract" python module because it has "OCR" functionality and it is very easy to use.

Steps for installing "textract": -
1. Press "Win + R", type "cmd" and hit "enter"
2. Run this command (without quotes): - "pip install textract"
4. Extract it and paste complete folder here: - "C:\Program Files"
5. Add "C:\Program Files\poppler-0.68.0\bin" to system path variable
6. Your "textract" setup has been completed successfully

Thanks! use that and enjoy :)
Рекомендации по теме
Комментарии
Автор

As I have identified, Error Code 127 is for not finding poppler in the system. So, follow the steps for adding poppler in your system variable path carefully to avoid these kinds of errors. If you are still facing any challenges regarding installation please let us know here. Happy Learning :)

AutomationTank
Автор

Very helpful. Thank you for taking the time to record this video!

austinhomolka
Автор

Hi, When I clicked on poppler link it is not downloading Zip fine. it is just downloading 7z file.

could you please share the link here ?

Thanks.

abdulsaleem
Автор

Hi FinTechChef how do I get that bin folder on 2021 version, I don't see please help me

Lindvni
Автор

I am getting error - failed with exit code 127
Any idea why ?

anshuld
visit shbcf.ru