Machine Learning Image Recognition

preview_player
Показать описание
When it comes to machine learning, we can teach artificial intelligence (AI) to recognize images, specifically numbers in this case, which is like a basic program for beginners. We want to use handwritten numbers from zero to nine, using something called the MNIST dataset. In the simplest terms, as we train our AI models, we're mainly dealing with a mathematical concept known as matrices.

These are basically groups of numbers arranged in a pattern similar to a grid. So imagine a 28 by 28 grid filled with floating point values, that's a matrix. When we work with AI, we're always dealing with matrices, whether we're dealing with image or text data.

We take the organic information, like text or images, and convert it into a format that works with matrices. Then we can perform mathematical operations on these matrices, like multiplication, to train our AI. The goal here is to help the AI identify patterns, rather than truly understand what it sees. Once we've trained our model with lots of data, we can give it new data and it will get us pretty close to the right answer, which can often be more efficient and accurate than a human.

So let's create an AI model that can detect images of handwritten numbers, using the MNIST program that's available with most AI frameworks. To do this, we'll use something called a convolutional layer, a pooling layer, and a fully connected layer, among other things. We're using programming platforms called Keras and NumPy to help us do this.

In the programming itself, we have to describe the input shape of our matrix for the input model, which is a 28 by 28 resolution image in this case, just a standard matrix. We also need to flatten, or simplify, our data into a matrix that can work with a convolutional network. The convolutional network holds most of the layer weights, which are basically the variables that will be adjusted during training.

A max pooling layer helps us reduce the overall number of variables by selecting the highest values. And finally, a dropout algorithm will randomly forget some data, again to simplify the model and avoid overfitting. With the MNIST dataset, we start with 70,000 images, which will either be a digit from zero to nine.

This means we have 10 possible outcomes. Our input matrix is the resolution of every single one of those pictures, which is 28 by 28 and with one color channel, so a grayscale image in other words. With our training data in place, we can now build our Model with Keras and Python.

We'll start by creating a sequential model, a type of model often used with neural networks. We'll add some layers to our model, like an input layer that defines the shape of the data for our model. Then we'll add a convolutional layer to keep track of our variables, and a pooling layer to reduce the number of variables.

We'll repeat this process until we've built our entire model, which will also include some flattening and dropout steps to simplify the model and avoid overfitting. No matter what type of image we're working with, our model will repeat these steps of filtering, pooling, and flattening until it arrives at the output layer.
Рекомендации по теме
Комментарии
Автор

Nice explanation but try explaning from scratch help u grow...

Motabhairecaps
Автор

I'm not sure if you can explain it, but I've been trying to find an explanation for a transformer architecture based on just a single word as text corpus like the word "as". Kinda in the same way there's a video explaining neural networks using only 4 pixels. I'm not sure if it's possible, but I'd love to see it. Thanks for the great content!

denysolleik