Neural network for image classification | Computer Vision from Scratch series [Lecture 4]

preview_player
Показать описание


Building a Two-Layer Neural Network from Scratch for Image Classification: A Step in the Computer Vision Journey

In this lecture of Computer Vision from Scratch, we take a meaningful leap from linear models to a slightly deeper neural network. The goal is to test whether adding hidden layers and non-linearity helps us improve image classification accuracy on the Five Flowers Dataset — which includes daisies, dandelions, roses, sunflowers, and tulips.

From Linear Models to Neural Networks
Previously, we used a simple linear model that flattened the image input and directly connected it to the output layer using a softmax function. While that allowed us to classify flower images to some extent, the model was limited in its capacity to learn complex, nonlinear patterns in image data. Accuracy ranged between 0.4 to 0.6 on the training set, and validation accuracy fluctuated heavily — a clear sign of instability and overfitting.

The Shift to a Two Hidden Layer Neural Network
This time, we moved toward building a neural network with two hidden layers:

Flatten Layer: Converts each 224x224 RGB image into a 1D vector (150k nodes).

Dense Layer with 128 Neurons: Introduced as the second hidden layer.

Output Layer: Consists of 5 nodes (for 5 flower classes) with a softmax activation function.

ReLU Activation: Used in the hidden dense layer to introduce non-linearity.

Why Activation Functions Matter
Without activation functions, a deep network remains equivalent to a single-layer linear model, as the multiple matrix multiplications collapse into one. This is why ReLU (Rectified Linear Unit) was introduced — to allow the model to learn nonlinear relationships in the image data.

Parameter Explosion
The new architecture increased the number of trainable parameters significantly — from 750,000 in the linear model to approximately 15 million in the two-layer neural network. This theoretically should improve model capacity, but performance doesn't always scale linearly with size.

Training Observations
Using the Adam optimizer with a learning rate of 0.001, we trained both models on batches of 16 images. While the loss function dropped dramatically (from tens to single digits), classification accuracy did not significantly improve. This may seem counterintuitive, but here's why:

Lower Loss, Same Accuracy: Even though the model’s predictions became more confident (lower cross-entropy loss), they weren’t necessarily more accurate. If a model's confidence in the correct class increases (e.g., from 60% to 95%), loss reduces, but accuracy remains unchanged.

Hyperparameter Experiments
To investigate further, we explored how varying image size and batch size impacts model performance:

Smaller images led to faster training but possibly reduced performance due to loss of detail. Larger batch sizes smoothed the loss curves but didn’t dramatically improve results.

Key Takeaways
Activation functions are crucial for introducing non-linearity and unlocking the power of deep neural networks.

Deeper isn’t always better — especially when limited by dataset size and computational resources.

Loss and accuracy are not the same — you can improve one without the other.

Hyperparameter tuning (batch size, image resolution, learning rate) plays a massive role in model performance and needs careful experimentation.

We are still learning from scratch — starting with modest accuracy is expected and acceptable.

What’s Next?
In the upcoming lecture, we’ll dive into regularization techniques like dropout to combat overfitting and experiment with deeper architectures. We'll also explore transfer learning, where we leverage pretrained models to improve performance without starting from zero.

If you’ve made it this far, give yourself a pat on the back. This was a dense but pivotal lecture. You’ve just built your first true neural network from scratch. Stick around — the journey is only beginning.
Рекомендации по теме
Комментарии
Автор

I have been following all the lectures, thanks for these 🙏. Kindly increase the pace of the lectures as everybody else is requesting 😊

Ankmehra
Автор

Timestamps
00:07 - Building a deep neural network with two hidden layers for image classification.
02:31 - Exploring classification accuracy of a deeper neural network for image classification.
07:10 - Exploring complex neural networks with two hidden layers for improved accuracy.
09:26 - Introducing a new hidden layer improves the neural network's predictive capabilities.
14:13 - Image tensors are transformed through layers to classify images.
16:31 - The second hidden layer adds no value without activation functions.
20:38 - Activation functions are essential for fitting nonlinear data in neural networks.
22:33 - Modifying the model introduces deeper layers for better learning of nonlinear patterns.
26:33 - Establishing training and validation data for image classification.
28:31 - Introduction to neural network architecture, training, and validation processes.
32:33 - Neural network training shows reduced loss but stagnant accuracy.
34:40 - Understanding cross-entropy loss in image classification predictions.
38:57 - Hyperparameter tuning is crucial for optimizing neural network performance.
40:54 - Hyperparameter tuning is crucial for optimal neural network performance.
44:44 - Reducing image size impacts training speed and classification accuracy.
46:40 - Choosing the right batch size is crucial for model training.
50:41 - Experiments with hyperparameters affect neural network accuracy and loss.
52:43 - Performance analysis of image classification model with varying batch sizes and image dimensions.
56:35 - Initial model accuracy is low; explore improvement strategies next.
58:27 - Encouragement and farewell from the course instructor.

alexramos
Автор

It looks like you guys are working very hard. How many hours do you guys work everyday?

urviskumarbharti
Автор

Great lecture brother again, thanks for it.

vinayppandey
Автор

Please continue with the Computer Vision playlist, the plan was well built but the frequency of the video upload is too low 😪

TabletAccount-jb
Автор

What if the image size is not symmetric by default? Is it okey to resize to a symmetric shape?

ashutoshdash
Автор

Sir it is very great course but sir if possible please increase the pace of course.

AMAN-csgf
Автор

Could you please do a video on point clouds data

shilpavpurushothaman
Автор

I think you considered input layer as a first hidden layer. I think it is mistake. Please let me know if I am wrong. Thank you sir

AIinAgriculture
welcome to shbcf.ru