Fast intro to multi-modal ML with OpenAI's CLIP

preview_player
Показать описание
OpenAI's CLIP is "multi-modal" model capable of understanding the relationships and concepts between both text and images. As we'll see, CLIP is very capable, and when used via the Hugging Face library, could not be easier to work with.

📕 Article:

📖 Friend Link (free access):

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Subscribe for Article and Video Updates!

👾 Discord:

00:00 Intro
00:15 What is CLIP?
02:13 Getting started
05:38 Creating text embeddings
07:23 Creating image embeddings
10:26 Embedding a lot of images
15:08 Text-image similarity search
21:38 Alternative image and text search
Рекомендации по теме
Комментарии
Автор

This was sick. Thank you for so patiently explaining each step. You could have just run a bunch of stuff you pre-wrote in a notebook. Doing it this way instead makes it an accessible entry point for people who might be interested in getting into ML in a more serious way. Very humbled.

chanm
Автор

I am blown away by your videos and learning every second. You are simply the best out here in this area of computing. I may be starting an academic research in computational linguistics reg. semantic change in loanwords. I would love to get in touch with you.

leonardvanduuren
Автор

What would be outputted if you were to manually select a random point within the vector space? Would it return an incoherent image? Or would it throw an error?

lee
Автор

Thanks for the great video. I am curious as the what type of performance you get? Obviously the hardware makes a difference, but in general how long does it take to get your results?

dontolley
Автор

When implementing this I got an error saying the images are in CPU and so embedding of this will not be possible, I was doing embedding of the image in my google drive with the help of clip embeddings.
Have you or any of the people reading my comment has tried this? please respond thanks in advance

avbendre
Автор

What are you using as the ide ? since it suggests auto completion. does it uses Github co-pilot ?

ayushranjan
Автор

what website or app are you using on the getting started section? I'm very very new to coding and stuff

smoreshark
Автор

great video, thank you. Have you ever tried image+text sematic search on image+text dataset is that a good way to interpret the combination of this embedding? for e.g. (image = 512dim + text = 512dim) which way is better way to combine those two embedding? can i just concatenate it and search on the database concate this vector embedding?

basi
Автор

So well spent time for me with this video, thank you so much.

antonispolykratis
Автор

this was awesome, how to get the code please, thanks

avbendre
Автор

Thank you so much!! This is exactly what I need.

xiaozaowang
Автор

Thanks for the valuable videos. I hav some doubts, kindly reply. 1. Whether NER tags can be used in semantic search or search engines/information retrieval tasks. Any links will be useful. 2. I hav experienced in usage of sentence transformers whether open AI models are heavy or high dimensional vectors to do similarity search?. 3.Can we apply this clip approach for query (text) mapping with images ( like bill images having texts)/assisted with OCR results. Thks in advance

venkatesanr
Автор

why +1 in (0, len(imagenette) +1) ?

antonispolykratis