CLIP: Connecting Text and Images

preview_player
Показать описание
This video explains how CLIP from OpenAI transforms Image Classification into a Text-Image similarity matching task. This is done with Contrastive Training and Zero-Shot Pattern-Exploiting Training. Thanks for watching!

Paper Links:

Thanks for watching! Please Subscribe!
Рекомендации по теме
Комментарии
Автор

OpenAI is doing some amazing work, Love it. Also, great analysis man 👍🏽

bakrx
Автор

Really impressive results from Open-AI and nice review! I wish my lab had that much compute power :')

Stwinky
Автор

Amazing work from openai and very nice review from Henry :-)

amrahmed
Автор

I'm not knowledgable in this field so some of the technical aspect get away from me, so please correct me. The images are scraped from the internet. Those images come with text attached, known as alt text. That text is used to identify the contents of the image during the initial training of the model. Then Zero shot is just rcognizting patterns and assigning text or categorizing the image.

That's probably not entirely right but my main question is if the image that was scraped has inaccurate text attached, for example a picture of a dog but the annotation says, "truck driving down a hill, " Will this result in inaccurate training? Or can CLIP identify through zero shot that the image is that of a dog and thus assign it a new text pair based on previous training?

The text that comes with the image when it is scrape is the key factor for accurate training. The model doesn't inhenrently know the difference between a dog and truck, it has to learn that through the image text pairs and it's possible to train model to think that a dog is a truck and vice versa.

lusterdog
Автор

Hello Henry,
Is it recommend to code all the machine learning algorithms from scratch so that I can learn math behind it or just understand and start to code?

subhanbasha
Автор

Why call it a zero shot model? Isn't the downstream task basically the same as a test set ?

imranq