filmov
tv
How to Make Your Images Talk: The AI that Captions Any Image
Показать описание
Image captioning is the process of taking an image and generating a caption that accurately describes the scene. This is a difficult task for neural networks because it requires understanding both natural language and computer vision.
In this video, I discuss my complete approach to this problem. For visual understanding, we will use Inception V3, and for natural language understanding, we will first use RNN, but it will fail to generalize well on unseen data, therefore we will shift to Transformer. And as you will see, Transformer will nail it!
Source Code:
🔗 Social Media 🔗
Timestamps:
00:00 Introduction
00:16 Quick overview of Image Captioning
01:08 The Model Architecture (RNN)
01:56 Getting the Image feature vectors using Inception V3
04:39 What Attention Mechanism is doing?
05:10 Choosing the Dataset
05:56 Data Preprocessing
06:54 Training!!!
07:13 Checking the results
09:24 Over Dramatic Transformer Introduction
10:25 Why I used COCO Dataset
11:12 Side-by-side result of RNN and Transformer
11:59 Deploying model to HuggingFace so anyone can use it!
#artificialintelligence #ai #deeplearning #machinelearning #transformer #transformers
Thank You,
Pritish Mishra
Комментарии