Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Показать описание

You will also get access to all the technical courses inside the program, also the ones I plan to make in the future! Check out the technical courses below 👇

_____________________________________________________________

In this video 📝 we will create an OCR Model To Read Captchas With Neural Networks In Keras And TensorFlow. We will first go over what a recurrent neural network is and why we are going to use that in this video to create an OCR model. We will talk about the CTC loss and at the end of the video, we will create the model, load in the dataset, preprocess it and train our neural network.

If you enjoyed this video, be sure to press the 👍 button so that I know what content you guys like to see.

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

📞 Connect with Me:

_____________________________________________________________

🎮 My Gear (Affiliate links):
🖥️ Desktop PC:

_____________________________________________________________

Tags:
#OCR #NeuralNetwork #Captchas #NeuralNetworks #DeepLearning #NeuralNetworksPython #NeuralNetworksTutorial #DeepLearningTutorial #Keras #Tensorflow

Рекомендации по теме

Комментарии

Join My AI Career Program
Enroll in My School and Technical Courses

NicolaiAI

For anyone who is getting poor results:

1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another

2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely.

3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit.

Hope this helps!

axelanderson

where is the repository link
i am not able to find it in description

adepusairahul

Do you think I can use your code to decode the digits of my water counter?

benoitd

Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer.
Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text.
I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .

souhailel-ghayam

Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images

GuyJustCool

Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?

syedmuzammilahmed

I m trying to run this code but m getting error like InvalidArgumentError : graph execution error
Anyone can help with this

omkarmestry

Hi Nicolai,
I was wondering would there be a way to feed in this kind of network wider images with text or have kind of dynamic input with size?

alexmoruz

can i extract text from images by the way ? My final project is extract text from images but i can not coding . I need to help please .

sule-yg

Can I use this for model for license plates?

coconutnut

can it suitable for text recognition task?

chelvanchelvam

Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do

prathamshah

can you provide library versions you used

int-

I've finally ended with this working configuration:

images = sorted(map(str,
labels = for img in images]
vocab = sorted(set("".join(labels)))
max_length = max(len(label) for label in labels)

char_to_num = StringLookup(vocabulary=vocab, mask_token=None, num_oov_indices=0, oov_token="[UNK]")
num_to_char = StringLookup(vocabulary=char_to_num.get_vocabulary(), invert=True, mask_token=None, num_oov_indices=0,
oov_token="[UNK]")

And rest of the code like in video.

megistone

Hi Nicolai, thanks for great explanation. Could you please explain how to measure accuracy?

ehsanroshan

## Preprocessing

# Mapping characters to integers
char_to_num =
vocabulary=list(characters), mask_token=None
)

# Mapping integers back to original characters
num_to_char =
vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True

hsnhsynglk

Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6]

How should I change or how do I understand what I need to change?

tricialamjingyi

Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺

abhisekseal

Excuse me bro, i have an issue when im running build_model() function after CTC Loss

its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x)

ValueError Traceback (most recent call last)
in <module>()
73
74 # Panggil Functionnya buat bkin model
---> 75 model = build_model()
76 model.summary()
in build_model()
41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa
42 new_shape = ((img_width // 4), (img_height // 4) * 64)
---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x)
44 x = layers.Dense(64, activation='relu', name='dense1')(x)
45 x = layers.Dropout(0.2)(x)
in __call__(self, *args, **kwargs)
975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
976 return self._functional_construction_call(inputs, args, kwargs,
--> 977 input_list)
978
979 # Maintains info about the `Layer.call` stack.

in _functional_construction_call(self, inputs, args, kwargs, input_list)
1113 # Check input assumptions set after layer building, e.g. input shape.
1114 outputs =
-> 1115 inputs, input_masks, args, kwargs)
1116
1117 if outputs is None:

in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature)
847 else:
--> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks)
849
850 def _infer_output_signature(self, inputs, args, kwargs, input_masks):
in _infer_output_signature(self, inputs, args, kwargs, input_masks)
886 self._maybe_build(inputs)
887 inputs =
--> 888 outputs = call_fn(inputs, *args, **kwargs)
889
890 self._handle_activity_regularization(inputs, outputs)

in call(self, inputs)
537 # Set the static shape for the result since it might lost during array_ops
538 # reshape, eg, some `None` dim in the result could be inferred.
--> 539
540 return result
541
in compute_output_shape(self, input_shape)
528 output_shape = [input_shape[0]]
529 output_shape += self._fix_unknown_dimension(input_shape[1:],
--> 530 self.target_shape)
531 return tf.TensorShape(output_shape)
532

in _fix_unknown_dimension(self, input_shape, output_shape)
516 output_shape[unknown] = original // known
517 elif original != known:
--> 518 raise ValueError(msg)
519 return output_shape
520

and this the error message
ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]

hendrywijaya

Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Build a Custom OCR Model in TensorFlow: A Step-by-Step Tutorial

LlamaOCR - Building your Own Private OCR System

Best OCR Models to Extract Text from Images (EasyOCR, PyTesseract, Idefics2, Claude, GPT-4, Gemini)

Optical Character Recognition (OCR)

Nanonets How to Train your own OCR Model

How Does Optical Character Recognition (OCR) Work?

How to build a custom OCR model using an AI modeler in Zoho Creator?

AI Plus OCR Equals 95 Percent Accuracy

Optical Character Recognition with EasyOCR and Python | OCR PyTorch

Coding OCR with machine learning from scratch in Python — no libraries or imports! (From Scratch #2)...

🫁DIY Respiratory System!🫁 #lungs #respiratorysystem #scienceteacher #humanbody

Convert Any Image to Text | Free AI Tool for Fast OCR #ai #aitools #ocr #imagetotext

Part 3 - DIY Weber Monocouche Render #diy #render #homerenovation #diyproject

Nanonets - How to Train your own OCR Model

Build optical character recognition (OCR) using LLM | Ollama | Vision LLM | Open Source

How to Train a Custom Invoice Text Extraction OCR Model with YOLOv8 + Paddle OCR

How to create a ocr service with easyocr and flask | in 10 minutes

The best system I’m using to render walls #diy #plasteringwork #construction #rendering #plastering...

Create an AI Chatbot in Minutes Using n8n! 🤖 (No Coding Required)

OCR TensorFlow and Python (95.55% accuracy) | Automatic scoring of handwritten test papers

Create 3D Characters From Pictures

How to Build AI Agents in 51 Seconds using N8N

Deepseek R1 vs ChatGPT O3 Mini – The Ultimate AI Battle in 2025! 🏆🤖