Using Tesseract-OCR to extract text from images

Показать описание

In this video we use tesseract-ocr to extract text from images in English and Korean. Optical character recognition is useful in cases of data hiding or simple embedded PDF. For OCR using tesseract, we must first convert PDF documents to high-resolution images.

Tutorial found here:

010001000100011001010011011000110110100101100101011011100110001101100101
Get more Digital Forensic Science

010100110111010101100010011100110110001101110010011010010110001001100101

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please link back to the original video. If you want to use this video for commercial purposes, please contact us first. We would love to see what you are doing.

Рекомендации по теме

Комментарии

Excellent Videos! As a second-language speaker, i appreciate your accurate spoken english a lot.Thanks!

Shaalimar

Thank you so much, it helps me dive into ocr really quickly.

fengxie

Thanks!! Hard to come across a tutorial as well explained as this one

axelmarruenda

for more than one language you could use the + sign to concatenate the 3-character ISO 639-2 language codes (see the man page)
e.g.
tesseract out.tiff -l eng+kor multi.txt

Mike.Freeman

Thank you so much! This is the simplest tutorial I could think of, that explains tesseract in depth.

jonathanvillatorocordoba

Thanks so much, it's very clear for not native English speaker too.

stefanodeboni

clear and concise! can't help but subscribe. Thanks buddy!

randomtoons

Finally a Native English speaker tutorial for this. Thank you very much.

Teck_

I liked all your videos which are very informative. you should produce more videos often. thanks

ahsan-lish

how we can apply the ImagetoString function for a live feed of cv2 (frames)??

havoclyyours

Joshua is there a way we can know if pdf contains graphical data (table, charts, graph, etc)?

hayatt

What if tesseract is unable to recognize the English font "Ford's folly italic and ladylike BB font " ? How do we embid the font into tesseract for recognising the characters in the PDF ?

zenoshirani

Not sure this was possible when this video came out, but a quick Google search just showed me that it seems to be possible to hand over several languages as parameters (using "+") at the same time.

ilianos

Did you ever find a way to combine the text from 2 languages? I have a 270 page pdf in Simplified Chinese with around 1/3rd in English....such a nightmare to translate.

UpcycleElectronics

So would I be able to recognize numbers and do math problems with them?

accentor

Need your opinion. I'm researching of how to take a jpeg photograph receipt and run a java app to get the text from the receipt. Is Tesseract would be a best solution?

lilazeonboa

thank you. Excellent video! how to install textract on windows 7 x64?

eloiulrichguebayi

Very detailed tutorial, can you show how to use PaddleOCR next time? It includes more languages

mengtaoan

Hi. Why the "Key words :" were NOT extracted from the document? See on 6.43.

arunaslipnickas

2x playback speed really improves the pacing.

KilgoreTroutAsf

Using Tesseract-OCR to extract text from images

Using Tesseract-OCR to extract text from images

Extract Text From Images in Python (OCR)

Tesseract OCR: Extract Text From Any Image

Extract Text From Images in the Browser (Using Tesseract OCR)

Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr

How to Install and Use Tesseract OCR on Windows - Optical Character Recognition

Extract text from Any PDF File (even scanned ones) using OCR pytesseract in 3 SIMPLE STEPS!

How to use Tesseract OCR in a Python script (pytesseract)

Making Unstructured Data Ready for RAG with Unstructured.io and Elasticsearch

Text Detection with OpenCV in Python | OCR using Tesseract (2020)

Using video2ocr / Tesseract-OCR to extract text from video

Extract Text from Image with Tesseract OCR

OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction

How to using Tesseract OCR (free OCR library) for PDF/Images

Tesseract-OCR extracting handwritten text

Extract text from images with Tesseract OCR on Windows

Pytesseract - Convert image to text using Python in just 3 lines of code

Image to Text with Python - pytesseract 💥 👍 2022

How to use Tesseract OCR with Java? | Extract text from image

Extracting text from images with gImageReader and Tesseract OCR on Windows

KTP-OCR ID Card text Extraction using pytesseract/Tesseract

how to extract text from images using excel vba with Tesseract OCR - 99Excel.Com

Python Extract Text from Scanned PDF | Python Extract Text from Image | Python Tesseract OCR Setup

Extract Text from Video - images | Tesseract