Beyond Keywords: Image similarity search in Azure Cosmos DB for PostgreSQL | Python Data Science Day

preview_player
Показать описание
Vector search, also known as vector similarity search, is a method that helps you find similar items based on their content rather than exact matches on properties like keywords, tags, or other metadata, as keyword-based search systems do. It leverages machine learning to capture the meaning of data, allowing you to find similar items based on their content. The key idea behind vector search is the translation of unstructured data, such as text, images, videos, and audio, into high-dimensional vectors (also known as embeddings) and the application of nearest neighbor algorithms to find similar data.

In this quickstart session, we will work together to build an image similarity search system utilizing Python, Azure Cosmos DB for PostgreSQL, and pgvector, an open-source vector similarity search extension for PostgreSQL. We will explore the process of generating vector embeddings using the Azure AI Vision multi-modal embeddings API and enabling the pgvector extension. We will then discuss the exact and approximate nearest neighbor search and use Azure Cosmos DB for PostgreSQL for storing and querying vector data.

Chapters:
00:00 Image similarity search in Azure CosmosDB for PostgreSQL
00:56 Why vector search?
01:49 Agenda
02:14 Turn data into vectors
03:02 Project the vectors onto the 2D vector space
03:37 How to measure if 2 vectors are simlar
03:56 Vector search workflow
04:34 Vector search in PostgreSQL
05:01 Create a table to store embeddings
05:34 Query embeddings
06:01 Demo
07:00 Vector search strategies
08:21 Create an IVFFlat index in pgvector
09:30 Demo
10:01 Resources

Resources:
Cloud Skills Challenge - through April 15, 2024

Рекомендации по теме