How to Make Your Images Talk: The AI that Captions Any Image

preview_player
Показать описание

Image captioning is the process of taking an image and generating a caption that accurately describes the scene. This is a difficult task for neural networks because it requires understanding both natural language and computer vision.

In this video, I discuss my complete approach to this problem. For visual understanding, we will use Inception V3, and for natural language understanding, we will first use RNN, but it will fail to generalize well on unseen data, therefore we will shift to Transformer. And as you will see, Transformer will nail it!

Source Code:

🔗 Social Media 🔗

Timestamps:
00:00 Introduction
00:16 Quick overview of Image Captioning
01:08 The Model Architecture (RNN)
01:56 Getting the Image feature vectors using Inception V3
04:39 What Attention Mechanism is doing?
05:10 Choosing the Dataset
05:56 Data Preprocessing
06:54 Training!!!
07:13 Checking the results
09:24 Over Dramatic Transformer Introduction
10:25 Why I used COCO Dataset
11:12 Side-by-side result of RNN and Transformer
11:59 Deploying model to HuggingFace so anyone can use it!

#artificialintelligence #ai #deeplearning #machinelearning #transformer #transformers

Thank You,
Pritish Mishra
Рекомендации по теме
Комментарии
Автор

I can't believe your video views bcz your explaination is on next level dude i thought it must have crossed atleast 1lakh but i hope it will soon cross it

yashwantrana
Автор

it's not a tutorial it's a movie i really enjoy it💙

محمدالفقى-يب
Автор

Amazing video! You made it interesting and practical. The memes and effects were lit.

gabip
Автор

Awesome Video bro !! You explained Image captioning in a simple and fun way.

hugehammer
Автор

I just started to realize the potential of AI, I already feel behind with all these new tools. Would love to see another video in the future about BlueWillow that is completely free

GolpokothokRaktim
Автор

Bro unable to get, Image caption using RNN. The link is not working. Can you please check.

EM-nrhj
Автор

Awesome video 🔥 and nice animation as always (or not it was more dramatic 😂😂😂) Way to go 👍🏻👍🏻👍🏻

dhiraj
Автор

It’s a good tutorial. But I have a question regarding attention mechanism. At 4:50, how doest it know to focus on dog getting "dog" words as input? If it knows by detecting object, then how does it know to focus on somewhere else when it receives Please make it clear.

Waliul_The_Wall-E
Автор

Just amazing! Loved this video. Keep more coming!

RudranilBhattacharjee
Автор

learned so many new things. thanks for making the video.

sapnilpatel
Автор

When I run the code on streamlit it shows two errors:
1. ValueError: axes don't match array.
2. ValueError: The name "conv2d" is used 2 times in the model. All layer names should be unique.
How can I solve the problem?

rasilmaharjan
Автор

How to use the saved model weights model.h5 in another file to make inferences on new images

venkatavivek
Автор

hey i looked into your kaggle notebook of transformer model with coco dataset, you mentioned that you only trained the model on 14k images for coco dataset, im a beginner in ml, so can you tell me what should i change in your code to increase the training dataset size from 14k

hellotherethere
Автор

Hey, great lecture! Just need a help, the link for the google colab for image captioning with rnn isn't working. It would be great help if you'll provide a new link. Thankyou!!

dishadubey
Автор

How can I save the model and run for android studio?

joycemalubay
Автор

Hi Pritish, amazing tutorials. Thank you. While running the transformers colab book getting error at -
----> 4 pred_caption = generate_caption(img_path). TypeError: `x` and `y` must have the same dtype, got tf.uint8 != tf.float32.

Can you please help!

aady
Автор

Nice video. How long does it take you to train the transformer model?

drafatkarim
Автор

link isn't working for "Image Captioning with RNN". @PritishMishra can you please share the code

akashbhavsar
Автор

Image captioning with RNN source code is not opening dude please upload 😊.

GANGADHARTHOTAKURA
Автор

Amazing video, where did you learn all of this? omg just saved me so much time. Life safer

BoloFofoPT