HNSW for Vector Search Explained and Implemented with Faiss (Python)

Показать описание

Hierarchical Navigable Small World (HNSW) graphs are among the top-performing indexes for vector similarity search. HNSW is a hugely popular technology that time and time again produces state-of-the-art performance with super-fast search speeds and flawless recall - HNSW is not to be missed.

Despite being a popular and robust algorithm for approximate nearest neighbors (ANN) searches, understanding how it works is far from easy.

This video helps demystify HNSW and explains this intelligent algorithm in an easy-to-understand way. Towards the end of the video, we'll look at how to implement HNSW using Faiss and which parameter settings give us the performance we need.

🌲 Pinecone article:

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Sign-up For New Articles Every Week on Medium!

👾 Discord:

00:00 Intro
00:41 Foundations of HNSW
08:41 How HNSW Works
16:38 The Basics of HNSW in Faiss
21:40 How Faiss Builds an HNSW Graph
26.49 Building the Best HNSW Index
33:33 Fine-tuning HNSW
34:30 Outro

Рекомендации по теме

Комментарии

Men, I have been sleeping on this channel, your stuff is awesome!

f.b.

Tried to read the paper, was super perplexed, this was awesome!!!

banbiossa

I was trying to read and understand the paper as I have to use HNSW KNN in one of my project. Demystifying topics through videos help me understand the paper more easily. Thanks for this video!

parthshah

Good video overall, but at 3:07 you don't go to the start block again (that would yield O(N) average search time complexity). You instead go down the last column that wasn't higher than the search key (which in this case is the 5 column), and this gives you O(log(N)) average search time complexity.

kristoferkrus

Its one of the best videos I have seen on HNSW. Somehow I could not able to understand the importance of efconstruction & efsearch values. Is there any article to know more about those parameters ?

ae

Looking at the C++ code in the video, the defaults are set to efSearch=16 and efConstruction=40 which is in line with what is shown in the graphs. M between 32 and 50 seems to be the sweet spot to maximize recall, minimize search time and have a memory usage below 1GB. Maybe efSearch can be pushed to 50-60 to increase recall with a small hit on search time. Thanks for the great video.

gerardorosiles

21:19 Why in the lowest level there aren't 1.000.000 values since the dataset is 1 million? What happened to the rest?

banxt

Thanks. How are links created? why some links at leyer 0 are not present while are present at layer 1? or viceversa.

pigrebanto

Sir, I have a question. Why nodes in the top layer has higher degree? "During the search, we enter the top layer, where we find the longest links. These vertices will tend to be higher-degree vertices (with links separated across multiple layers), meaning that we, by default, start in the zoom-in phase described for NSW." I found this in the doc, don't understand why

xuantungnguyen

“if you need it to be faster, , you can use an ivf index on top of HNSW to improve search time”， I don't understand this statement. Any article that can refer to ?

lewis_chan

I understand but if i implement this how can i do to retrieve the top k most similar results?

raulmc

is it not the case that when we move from layer 3 to layer 2, we do not need to go to the very beginning, it goes to the node 5 in layer 2. and from node 19, when it jumps to layer 1, it should go to node 11 instead of again going back to the start node. otherwise, it search efficiency again goes back the O(n)

deepaksaini

If I understood correctly, the size of the graph does not depend on the number of dimensions, i.e. 128-dimensional vectors would result in the same space cost as 1024-dimensional vectors. For reasonable M, this space should probably be smaller than the space used by the vectors themselves. However, I saw threads (unfortunately, I didn't keep the links) where they complain that using HNSW results in using several times more space than the space used by the original vectors and hence requires a lot of RAM and disk space. How could that be explained?

meirgoldenberg

Is it possible to identify which combination of elements caused the outlier? I am using the HNSW algorithm on Github.

vanguard

what does a link represent? closeness? so what if a node is isolated?

pigrebanto

I would like to draw contour lines based on the degree of anomaly when representing the plot on a two-dimensional graph. If there is a method, please instruct me.

vanguard

So it's basically skip list but in vector space?

EvgeniyDolzhenko

Um, how to build the index. Is there any explanation step by step ?

yinghaohu

Can this method be used for searching non-vector data in a non-metric space? I'm thinking of variable length timeseries with dynamic time warping as a distance measure.

atomscott

Any idea of how to add ids to the vector for a posterior integration with database? Excellent explanation!

wilfredomartel

HNSW for Vector Search Explained and Implemented with Faiss (Python)

Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained

HNSW for Vector Search Explained and Implemented with Faiss (Python)

HNSW Vector Index | Vector Database Fundamentals

Hierarchical Navigable Small Worlds (HNSW) Explained #machinelearning #datascience #vectordatabase

Graph-Based Approximate Nearest Neighbors (ANN) and HNSW

HNSW-FINGER Explained!

Vector Search & Approximate Nearest Neighbors (ANN) | FAISS (HNSW & IVF)

Deep Dive into HNSW Architecture: Solving Problems with MongoDB Vector Search

Vector Databases simply explained! (Embeddings & Indexes)

Hierarchical Navigable Small World (HNSW) Indexing Algorithm | Vector Database | Vector Search #ai

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Approximate Nearest Neighbors : Data Science Concepts

HNSW - Explained (w/ caps) #machinelearning #datascience #dataengineering #computerscience

Understanding How Vector Databases Work!

AI Search with HNSW: Hierarchical Navigable Small World (HNSW)

What is a Vector Database? Powering Semantic Search & AI Applications

Approximate Nearest Neighbours in FAISS: Cell Probe 101

Optimizing Vector Databases With Indexing Strategies

HNSW for vector search (#vectordatabases #mlops)

Vector Database Explained | What is Vector Database?

A Beginner's Guide to Vector Embeddings

ANN Algorithms explained | Ep5 Objectbox Bites: Vector Databases

hnsw vector index vector database fundamentals

HNSW vs. PG Vector IVF Flat: Accelerating Indexing Efficiency with Enhanced Accuracy