OpenAI's CLIP Explained and Implementation | Contrastive Learning | Self-Supervised Learning

preview_player
Показать описание
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

✅Recommended Gaming Laptops For Machine Learning and Deep Learning :

✅ Best Work From Home utilities to Purchase for Data Scientist :

✅ Recommended Books to Read on Machine Learning And Deep Learning:

Connect with me on :

#datascience #nlp #deeplearning #ecommerce
Рекомендации по теме
Комментарии
Автор

Really helpful for me to understand the CLIP concept. Thank you

muhammadumer-qktx
Автор

Excellent presentation ! Very simplified

anuradhabalasubramanian
Автор

Hy! Can you give me the loss function formula?

lakatosgabor
Автор

The formula of the targets is incorrect. It should be torch.arange(n). In fact, the goal is to make the similarity matrix become an indentity matrix.

dunghuy
Автор

Nice video about training CLIP. I tried to connect to your Kaggle notebook and it is saying that it is no longer there (404 error). Is there a different link we could access your notebook from the video on?

robboswell