OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction

preview_player
Показать описание
OCR text extraction using docTR. OCR text output seems to be better on Table data as well. Tesseract OCR generally fails to extract the structured data.

✅Recommended Gaming Laptops For Machine Learning and Deep Learning :

✅ Best Work From Home utilities to Purchase for Data Scientist :

✅ Recommended Books to Read on Machine Learning And Deep Learning:

Connect with me on :

#datascience #nlp #deeplearning #documentunderstanding
Рекомендации по теме
Комментарии
Автор

Thanks a lot for sharing this better OCR Engine

NickWindham
Автор

Thanks a lot for sharing this concept..
Can you explain about docTR training text detection and recognition
Pls

ramyas
Автор

is DOC TR OCR can be used for commercial purpose.

pranay
Автор

i have a problem i wanted the extracted text in same format as image can you tell me how to get the structured output same as image?

gokuliveyt
Автор

hi..plz help me
i got this one error.... partially initialized module 'doctr.models' has no attribute 'classification' (most likely due to a circular import)

pnywotm
Автор

Is there anyway to turn the exported js object/json back into a pdf?

copaceticobserver
Автор

please can you make a video on how to fine-tune DocTr on custom dataset

josuedegbun
Автор

Hey did you try replacing different extraction algorithms like Master, sar_resnet31 I tried it's not working they didn't release those models as open source?

giritejareddy
Автор

Hey, how to convert if we have many individuals I'd cards in a scanned image pdf and need to convert them into excel

umamaheswararaom
Автор

What about after extract the text, could you please show us storing values in excel file or in dataframe

jaikumardaiya
Автор

hi i am facing error related to the doctr_io related

mgwikow
Автор

Thanks for the video. When I try to install doctr on Jupyter, I get the following error :
OSError: cannot load library 'gobject-2.0-0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'gobject-2.0-0'
However, I am able to install on Google Colab. Any help with the Jupyter installation would be a great help !!

venkateshvanka
Автор

Do you have any process of getting text from different bank's passbook scans. information like Account Holder name, Accout no. Nominee Name, IFSC code. save it in the dataframe
But remember all the passbook have different layout and different clarity and quality

JaiKumar-dsrq
Автор

not able to read pdf filr

error : module 'pypdfium2' has no attribute 'render_pdf_topil'

ramnivasjat
Автор

Nice Video, could you please tag the colab notebook link ?
I am facing an error ' pypdfium2 --> AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'. i even down graded pypdfium2 to 1.0.0 without any solution.Could you shed some light on it?

thanks

machinelearningzone.
Автор

​ Hi buddy i followed your this video "OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction" and got json file of my text present in images. now can you tell me how to get that text in to a txt file or docx file on anyother format u suggest where i can get the same structure of text like it was in the img. Also how to do that? like i tried my all possible ways but all was failures. Can you help me to get out of this problem? please its related to my fyp. Thanks in advance

mushafmughal
Автор

hi. please make video on extract hindi table contains text in devnagri or utf-8 to csv from images. i try lot on inter but not found any video or method.. please make video on this it will help lot

GuruTechHub