Tesseract OCR: Extract Text From Any Image

preview_player
Показать описание
Have you ever needed to extract text from an image, maybe you took a screenshot of something or you need to get a transcript of a meme, well luckily for you Tesseract OCR exists to do exactly that.

==========Support The Channel==========

==========Resources==========

=========Video Platforms==========

==========Social Media==========

==========Credits==========
🎨 Channel Art:
All my art has was created by Supercozman

#Tesseract #OpticalCharacterRecognition #TesseractOCR #Linux

🎵 Ending music

DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase I may receive a small commission or other compensation.
Рекомендации по теме
Комментарии
Автор

I've been a subscriber to your channel for a couple of years and I like all of your videos, but this is my favorite kind of video -- about a *useful*, open source tool or utility, not specific to Arch linux, not about gaming, not about drama in the industry. I'm not saying the other kinds are bad or that you shouldn't make them, but I like this kind the best.

code
Автор

I have used tesseract a bit for digitizing recipes from recipe books. When it does not give good results on a first past I have found that altering the image can help a lot. Altering the image to black and white, altering the contrast, and even enlarging the image can all improve results.

Mpickles
Автор

So if Google is maintaining this project, is Google Lens just a front-end for Tesseract?

Fooftilly
Автор

I started using tesseract for a project to gather the text off my memes hosted on my personal szurubooru (In an attempt to be able to search for set text, so you are able to actually find stuff within the thousands of images).

It has been very hit or miss, sometimes it gets text right down to the punctuation, other times it gets nothing, on low res bad contrast images were I think it has no shot it gets it, on clean images it gets nothing. Sometimes doing crazy image manipulation helps, sometimes unmodified is best.
What I can say is that handwritten Latin letters are impossible for it, so manga scanlations text is just blank for it, at least with the English language setting

dergeneralfluff
Автор

A little script can be done alongside a screenshot utility to get OCR from screenshots directly to the clipboard

t
Автор

you have video about every topic. that's awesome.

muctebanesiri
Автор

I use a keybinding to invoke a script which takes the screenshot of an area, pipe it to tesseract and copy the resulting text contents to the clipboard.

atomixhawk
Автор

i use tesseract in python to read text and train the machine. good job explaining this 👏

sazk
Автор

the sad thing about pytesseract is it works as long as the background of your image is of semi-color, other than that it would mess up everything.

khaibaromari
Автор

Thanks for the helpful video Brodie. Do you know if you can use Tesseract to convert a non-OCR'ed PDF into a PDF that contains OCR'ed text?

EastEndKeith
Автор

OH lol, something just hit me, Google lens uses their own Tesseract OCR for extracting text and send it to your PC where you are logged in with your google account.

damarh
Автор

This seems fairly nice for searching tango. Maybe you should also check if it can do well checking the words 1 by 1, perhaps with some other ways of framing them. I am curious how it works on Middle Eastern languages like arabic and hindi though.

someonestolemyname
Автор

Anyone, know how could I add a second language in the same command line? I tried the next command and it doesn't work: tesseract filename.jpeg - -l ara[+spa] filename.txt

ELHASSANEMOUMADARFAK
Автор

Are you able to input a URL instead of a local file on your PC? This would be very useful.

solidhyrax
Автор

Excellent. I made the mistake of writing a couple thousand small notes in the stock Samsung notepad on my phone, and it turns out the garbage developers only allow you to bulk export them as PDFs instead of plain text. This will come in handy.

davidr
Автор

Any one know how to set xsane to use tesseract?

patrickmclaughlin
Автор

Google can't just repeat that, google drive to google docs conversion beats tesseract

mattaku
Автор

See your odyssey tips and tricks for my comment

uksuperrascal
Автор

I'm a huge japan-fan, love the culture, the food and people... but anime/weebs? Cringe.

bologna
Автор

I found use --oem=1 helpful, it forces to use the new ml model which helps a lot of cases

leoliu