Choosing Indexes for Similarity Search (Faiss in Python)

Показать описание

Facebook AI Similarity Search (Faiss) is a game-changer in the world of search. It allows us to efficiently search a huge range of media, from GIFs to articles - with incredible accuracy in sub-second timescales for billion+ size datasets.

The success in Faiss is due to many reasons. One of those, in particular, is its flexibility. Faiss recognizes that there is no 'one-size-fits-all' in similarity search.

Instead, Faiss comes with a wide range of search indexes - which we can mix and match to our choosing.

However, this great flexibility produces a question - how do we know which size fits our use case?

Which index do we choose? Should we use multiple indexes, or is one enough?

This video will explore the pros and cons of some of the most important indexes - Flat, LSH, HNSW, and IVF. We will learn how we decide which to use and the impact of parameters in each index to build some of the best indexes for semantic search.

🌲 Pinecone Article:

🎉 Sign-up For New Articles Every Week on Medium!

Download script for Sift1M dataset:

Similarity Search Series:

🤖 70% Discount on the NLP With Transformers in Python course:

👾 Discord

Mining Massive Datasets Book (Similarity Search):

🕹️ Free AI-Powered Code Refactoring with Sourcery:

Рекомендации по теме

Комментарии

Thanks a bunch for this, James! Would be really great to see a couple of them get explored in depth. Also, if you could benchmark FAISS against ScaNN, it will help a few of us noobs a hell lot.

Great content! Lovely command over your content. Really need more of this.

narayansharma

Great explanations, especially for IVF - it's probably the best explanation for how it works that I've seen.

Nick-vszp

Super Informative Content!
Thank you so much for this.

harshitjaitly

Hi James.

Thanks for such a wonderful tutorial. Really useful. A quick question, For a new query vector, is it possible to return the IVF cell/partition that it belongs to, instead of returning the neighbors? I think I can measure the distances with centroids and return the closest centroid. However, I was thinking if there is built-in way.

grayrigel

Thank you for your video. Most Valuable Channel. Do you use GPU for indexing in this projects?

katehan

Thanks for amazing video! Do you know why simple K-means are not used for these MIPS problems?

haneulkim

Does the IVF algorithm works with high dimensional data please like 100?

mohammadyahya

Super useful! Thanks for this video James. For IVF, can we retrieve the clusters that each datapoint belongs to after training (also cluster centroids)?

itheenigma

Can share the video assume I have binary data of train and test, so need to calculate the haming distance, I didn't found any videos using faiss, if share the video that may more helpful

nareshsandrugu

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Vector Databases simply explained! (Embeddings & Indexes)

Vector databases are so hot right now. WTF are they?

Best Indexes for Similarity Search in Faiss

Faiss - Introduction to Similarity Search

HNSW for Vector Search Explained and Implemented with Faiss (Python)

Webinar replay: Vector Similarity Search & Indexing Methods

Vertex AI Matching Engine - Vector Similarity Search

191 - Measuring image similarity in python

Similarity Search with FAISS and Azure SQL | Data Exposed

Vector Similarity Search | Future of Data & AI | Data Science Dojo

What is Semantic Search?

I built an image search engine

Product Quantization for Vector Similarity Search (+ Python)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

Beyond Keywords: Image similarity search in Azure Cosmos DB for PostgreSQL | Python Data Science Day

Metric Indexing for Graph Similarity Search - Franka Bause

Understand Cosine Similarity | 2 Minute Tutorial

IndexLSH for Fast Similarity Search in Faiss

K Nearest Neighbors | Intuitive explained | Machine Learning Basics

How to Choose a Vector Database

Lec-93: Why Indexing is used | Indexing Beginning | DBMS

Machine Learning | Similarity Measures

AI Show | Similarity and Scoring in Azure Cognitive Search