Fast intro to multi-modal ML with OpenAI's CLIP

Показать описание

OpenAI's CLIP is "multi-modal" model capable of understanding the relationships and concepts between both text and images. As we'll see, CLIP is very capable, and when used via the Hugging Face library, could not be easier to work with.

📕 Article:

📖 Friend Link (free access):

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Subscribe for Article and Video Updates!

👾 Discord:

00:00 Intro
00:15 What is CLIP?
02:13 Getting started
05:38 Creating text embeddings
07:23 Creating image embeddings
10:26 Embedding a lot of images
15:08 Text-image similarity search
21:38 Alternative image and text search

Рекомендации по теме

Комментарии

This was sick. Thank you for so patiently explaining each step. You could have just run a bunch of stuff you pre-wrote in a notebook. Doing it this way instead makes it an accessible entry point for people who might be interested in getting into ML in a more serious way. Very humbled.

chanm

I am blown away by your videos and learning every second. You are simply the best out here in this area of computing. I may be starting an academic research in computational linguistics reg. semantic change in loanwords. I would love to get in touch with you.

leonardvanduuren

What would be outputted if you were to manually select a random point within the vector space? Would it return an incoherent image? Or would it throw an error?

lee

Thanks for the great video. I am curious as the what type of performance you get? Obviously the hardware makes a difference, but in general how long does it take to get your results?

dontolley

When implementing this I got an error saying the images are in CPU and so embedding of this will not be possible, I was doing embedding of the image in my google drive with the help of clip embeddings.
Have you or any of the people reading my comment has tried this? please respond thanks in advance

avbendre

What are you using as the ide ? since it suggests auto completion. does it uses Github co-pilot ?

ayushranjan

what website or app are you using on the getting started section? I'm very very new to coding and stuff

smoreshark

great video, thank you. Have you ever tried image+text sematic search on image+text dataset is that a good way to interpret the combination of this embedding? for e.g. (image = 512dim + text = 512dim) which way is better way to combine those two embedding? can i just concatenate it and search on the database concate this vector embedding?

basi

So well spent time for me with this video, thank you so much.

antonispolykratis

this was awesome, how to get the code please, thanks

avbendre

Thank you so much!! This is exactly what I need.

xiaozaowang

Thanks for the valuable videos. I hav some doubts, kindly reply. 1. Whether NER tags can be used in semantic search or search engines/information retrieval tasks. Any links will be useful. 2. I hav experienced in usage of sentence transformers whether open AI models are heavy or high dimensional vectors to do similarity search?. 3.Can we apply this clip approach for query (text) mapping with images ( like bill images having texts)/assisted with OCR results. Thks in advance

venkatesanr

why +1 in (0, len(imagenette) +1) ?

antonispolykratis

Fast intro to multi-modal ML with OpenAI's CLIP

Fast intro to multi-modal ML with OpenAI's CLIP

OpenAI CLIP Explained | Multi-modal ML

All Machine Learning Models Explained in 5 Minutes | Types of ML Models Basics

Multi-Modal ML With Financial Text and Tabular Data

Explained In A Minute: Neural Networks

Deep Dive into Multimodal Embeddings Part 1&2

K Nearest Neighbors | Intuitive explained | Machine Learning Basics

Transformers, explained: Understand the model behind GPT, BERT, and T5

Multimodal Generative AI for Precision Health

PyTorch in 100 Seconds

Multimodal Machine Learning at Scale: Democratizing AI for Academic Research

Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.

LLM Explained | What is LLM

MiniGPT-4 - Multimodal model handling images and text

Building Multi-Modal Search with Vector Databases

Hands-on with Multi-Modal Machine Learning and Predicting Customer Reviews

Michał Nowicki - Multi-Model Mobile Robot Localization | ML in PL 23

Go in 100 Seconds

Multimodal Image-text Classification

Foundation Models: An Explainer for Non-Experts

Random Forest Algorithm Clearly Explained!

Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision

WACV18: Fast Self-Attentive Multimodal Retrieval

Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models