Extract text from images with Tesseract OCR on Windows

preview_player
Показать описание
In this video we use tesseract-ocr to extract text from images in Korean on Windows. Optical character recognition is useful in cases of data hiding or simple embedded PDF. For OCR using tesseract, we must first convert PDF documents to high-resolution images.

010001000100011001010011011000110110100101100101011011100110001101100101
Get more Digital Forensic Science

010100110111010101100010011100110110001101110010011010010110001001100101

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please link back to the original video. If you want to use this video for commercial purposes, please contact us first. We would love to see what you are doing.
Рекомендации по теме
Комментарии
Автор

This is really good tutorial. I appreciate the care you took in going step by step, especially through altering the path.

josephc
Автор

This is the most helpful tutorial on Tesseract that I've found. Thank you.

GNS
Автор

omg. I was watching your video to install Tesseract. Meanwhile, I was amazed that you can read Korean. I thought you chose a random non-english language to prove Tesseract works with different language. Amazed as a Korean.
I am trying to learn how OCR works because I want to make an app that requires OCR. But I have no coding experience or anything even close to digital languages, I am having some difficulties. At least I was able to use Tesseract after watching this video. Thank you so much!

hkim
Автор

Thanks for this tutorial: I have had trouble with converting text in mayan language here in Guatemala, I followed your steps and voila!
Next step for me is to figure out how to train a set of recognition for our local mayan alphabets.
Thanks a lot.

Автор

Very very good tutorial for tessseract for koreans and clear pronunciation. Thank you.

TheJoinckim
Автор

FYI, If we never add anything to PATH other than default one, it will not pup-up that edit selection box.
So by looking your video, i need to manually make the entry by separating new one with ";" (semicolon)
Afterwards, if i click the edit button, i get the same pop up edit box.

seung-wanson
Автор

Hi, a very good tutorial, but as mentioned by yourself, and a comment by another, ref batch folder/file processing, I can not see or find any uploaded tutorial video

philglanville
Автор

Thanks a lot for this but can i use this for manuscripts as well? And if so plz tell me how :)

R.t.a.s
Автор

Interestingly enough, the default install path for the Windows x64 version is:

epochseven
Автор

How did you turn each page of the pdf into pngs? Thank you for this high-quality video.

opheliafromlcf
Автор

Can you tell how to train our own dataset ??

deepak
Автор

Your voice makes me happy to browse youtube, so clear fuark

TzKetm
Автор

What mic are you using? Great video, thanks!

itsdannyftw
Автор

a video on tips on how to train tesseract would be great! anyway thanks a lot for this video so far! helpful for my first steps and really appreciated!
I'm wondering if someone has already done -as something more looking like a sort of end user application rather than an in-the-field programmer use - (or eventually how to do it ) 1) an overlay of the pictured document and the ocr recognition in such a way that the original document remain displayed as it is but "highlight-able " or 2) aslo how to generate a parallel ocr document which keeps the letter positioning and layout in the space page of the ocr output like on the original picture and in case of a document keep the original cutted picture in case of difficulties and low confidence level in the recognition. like for example on graphs pictures drawings...

kevinsanti
Автор

you can change your pdf to a one tiff file instead of converting it to several png files

ahmedfarouk
Автор

im kind of skeptical of allowing changes to hardware. is it completely safe?

allirashna
Автор

Hi sir
Much needed video..
Can u tell me how to train tesseract to identify specific font

prateekgupta
Автор

So, should I do it one by one? I have complete books, is there no way to do this for several images?

pixelvader
Автор

keep making these videos man! interesting content

emmanuelvelasco
Автор

Your instructions are phenomenal. You are amazing to explain computer commands and tricks. The only problem is that this program sucks and it is a nightmare to use it
Its not your fault. Thanks so much for teaching so many tricks.

jarongaus
join shbcf.ru