Training Tesseract 5 for a New Font

Показать описание

Build Tesseract from source video:

GitHub repository link:

Training command:
TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Apex START_MODEL=
eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000

Correction: I believe the box file contains the bounding box (OBB) coordinates of the character within the image

Рекомендации по теме

Комментарии

God I love you. I just recently started messing with OCR's, specifically Tesseract, and I was reading through some documentation on the steps and after a few hours just wanted to end my life hahahaha. Thank you for this, this is extremely encouraging. I can't wait to try this!

taylorbarnes

I think the reason why the word error rate is high is because the font doesn't distinguish uppercase with lower case (it's all upper case) but the ground truth label distinguish between the two.

yichenyao

This video on training is the only source that by following this you will be able to achieve results! Many thanks for this video!

AchievementHuntGuru

thank you so much man. I've been looking everywhere for a tesseract tutorial, it all just points to the shitty unreadable docs. Without you I don't know where I'd be

donjuanpond

Tesseract's documentation is abysmal.

bunyn

Estuve rompiendome la cabeza tratando de entender el tutorial oficial y tú lo explicas de una manera sencilla. Soy tu suscriptor numero 666, Muchas Gracias.

fivalt

Haven't watched the video yet, but if this works, you'll have my eternal gratitude

videos

Hey Gabriel, I am following your steps to train on my model on hand writtent text. But it is always failing with this erro:

unicharset_extractor --output_unicharset "data/Apex/my.unicharset" --norm_mode 2 "data/Apex/all-gt"
Failed to read data from: data/Apex/all-gt
Wrote unicharset file data/Apex/my.unicharset

Can you please help me here? I am stuck. Thanks!

madhavpandey

I've been experimenting with this tutorial for three days, the file structure and the GitHub doesn't necessarily match, can you please update the repo if possible . I am having too many folder inconsistencies when trying to to connect the dots here as it was brushed over really quickly, thank you :)

ConfusedProgrammer

I tried with this font for hindi language ( Kruti Dev 010 ) even tried with Kruti Dev 016 but its showing : Error: Call PrepareToWrite before WriteTesseractBoxFile!!

ganeshrajv

Hi. Theres a font used in a game i would like to prepare for training. Would all i need to do is screencapture the words used in that font according to what you describe, or do i need a different approach?

nobafan

Great tutorial. Using WSL I was constantly getting new errors. Switching to OS installed on VirtualBox solved it. I was able to train my dataset—it's surprisingly easy.

wojd_

Thank you for making this video. But I can't wrap my head around where to put all those data files to? I'm trying to fine tune variations of letters with accents, and I'm helpless.

ombieautopilot

Hi Gabriel.
Thank you for this tutorial.
I was trying to run the code but I'm receiving this error:
Fontconfig error: Cannot load default config file: No such file: (null)
This error appears to be font-related. I've experimented with several fonts but I'm unable to resolve this issue.
Could you help me please?

shadyas.

I want to custom train Tesseract 5 to read the license plates of the car which are detected using YOLO model. How can I do these as I have couple of thousand images? Help
What are the steps I need to follow?

Leo-hkkk

While running the script 'split_training_text.py'. I am getting the following error:

Fontconfig warning: "/tmp/fonts.conf", line 4: empty font directory name ignored

Could you help me how to resolve this?

aayushjain

So far, the only tutorial on Tesseract 5, the old model of training by bash has been abandoned since December 2022

adityanjsg

the title is for new font, can I take it as new language ? using TIFF

ganeshrajv

when tesseract training is start it show the bellow warning
Can't encode transcription: 'पिए वई। ज़ख़मनि जो सूर वधंदो वियो हू चीखन्दो for Sindhi
how I can handle this problem?

DalvinderKaur-izsn

Hi.I try this on colab. I install tesseract and go on to run split_training_text.py and get this error FileNotFoundError: [Errno 2] No such file or directory: 'text2image'. Is there a solution?

listentomusicfeellikehome

Training Tesseract 5 for a New Font

Training Tesseract 5 for a New Font

How to Train Tesseract OCR Engine 5 on Custom Data

Tesseract OCR - Lesson 2: Training Tesseract for new font

Building Tesseract 5 from Source with Training Tools

Training/Fine Tuning Tesseract OCR LSTM for New Fonts

Script for generating training data for tesseract-ocr

Train tesseract model on custom dataset (Arabic numbers)

Training Tesseract using Lios Tesseract-Trainer GUI

Tesseract OCR - Create Trained data for Seven segment (Sample)

How to Install and Use Tesseract OCR on Windows - Optical Character Recognition

[ Image To Text ] Train new Font with Tesseract in Google Colab (5x Faster)

How do I train Tesseract for a new ttf font?

How to use Tesseract OCR in a Python script (pytesseract)

Tesseract OCR Training for New Fonts Language

Tesseract Training on Custom Font in PDF and Images

Demo Projet Discovery - Tesseract OCR Training Effect

Windows : How can I train Tesseract on Windows

Tesseract js | React js | OCR

Training Tesseract-OCR for english language fonts

#5 Live: TFRecords, Transcribing Text with Tesseract, and StyleGAN

Hanno Embregts - Entering the Fourth Dimension of OCR with Tesseract

Extracting Text from Png files in R | Tesseract Package | OCR | R Studio

Extract Text From Images in the Browser (Using Tesseract OCR)

Creating Tesseract OCR using Python: part-1 installing and getting started with Tesseract