Supercharge eCommerce Search: OpenAI's CLIP, BM25, and Python

preview_player
Показать описание
We build a multi-modal hybrid search engine for ecommerce using OpenAI's CLIP, BM25, Pinecone vector database, and Python. The search engine processes text and image-based queries and can produce better results than traditional methods.

The search engine allows users to search and retrieve data using both text and visual queries, which is especially useful in e-commerce domains where users have a range of search queries, from specific product searches to image-based searches for related items.

By using CLIP and BM25, the search engine can process both text and image-based queries, providing users with a comprehensive search experience. Additionally, Pinecone vector database and Python allow for easy indexing, storage, and retrieval of data, making it possible to handle large volumes of data in real time.

📌 Example notebook:

🎙️ AI Dev Studio:

👾 Discord:

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Subscribe for Article and Video Updates!

00:00 Multi-modal hybrid search
01:05 Multi-modal hybrid search in e-commerce
05:14 How do we construct multi-modal embeddings
07:05 Difference between sparse and dense vectors
09:43 E-commerce search in Python
11:11 Connect to Pinecone vector db
12:04 Creating a Pinecone index
13:45 Data preparation
16:32 Creating BM25 sparse vectors
19:33 Creating dense vectors with sentence transformers
20:26 Indexing everything in Pinecone
24:41 Making hybrid queries
26:01 Mixing dense vs sparse with alpha
32:11 Adding product metadata filtering
34:13 Final thoughts on search
Рекомендации по теме
Комментарии
Автор

A demo of what we are about to learn in the beginning of the video would greatly help an infant such as myself in this field.

yamani
Автор

This channel is shockingly good for its subscriber count. Lucky I found you. Thanks!

iknowsolittle
Автор

very nice, the sparse and dense vector mix can apply to many sceanrios.

adamswang
Автор

This video is great! Instead of running on Colab, could you make a video that shows an up and down connection from an html front end to the Pinecone database, specifically uploading a PDF, vectoring it, querying, and displaying the results back through html? I also emailed you for some consulting work on a project. Thanks for the videos!

JasonMelanconEsq
Автор

I'm using s1 pod and trying to create an hybrid index with 10k vectors.
Will there any pricing difference between using a dense vector index alone and using a dense+sparse vector index from pinecone side?

gowthamkrish
Автор

This demo is fascinating. I would love to learn what technology to add to extend the demo, to maintain context between queries.

chrismaley
Автор

Amazing content as always. I was wondering, is it recommended to use embeddings such as the ones form Openai or cohere instead of BM25?

JuanLopez-ocyv
Автор

Hello James, great content. I have 1 query. How do we handle the query "show me blue jeans under $50", this "under $50" value while building a search engine. If you can guide me, would much appreciate it, thank you.

hemanshupan
Автор

Is there a reason why you didn't use CLIP to generate both image and text embeddings?

JohnKing
Автор

Hi thanks for sharing the video it is really useful. For this type of usage, other the Pinecone are there any other vector DB that run offline on local machine?

atomhero