OpenAI's New GPT 3.5 Embedding Model for Semantic Search

Показать описание

In this video, we'll learn how to use OpenAI's new embedding model text-embedding-ada-002.

We will learn how to use the OpenAI Embedding API to generate language embeddings and then index those embeddings in the Pinecone vector database for fast and scalable vector search.

This is a powerful and common combination for building semantic search, question-answering, threat detection, and other applications that rely on NLP and search over a large corpus of text data.

Everything will be implemented with OpenAI's new GPT 3.5 class embedding model called text-embedding-ada-002; their latest embedding model that is 10x cheaper than earlier embedding models, more performant, and capable of indexing ~10 pages into a single vector embedding.

🌲 Pinecone docs:
Colab notebook:

🎙️ Support me on Patreon:

👾 Discord:

🤖 AI Dev Studio:

🎉 Subscribe for Article and Video Updates!

00:30 Semantic search with OpenAI GPT architecture
03:43 Getting started with OpenAI embeddings in Python
04:12 Initializing connection to OpenAI API
05:49 Creating OpenAI embeddings with ada
07:24 Initializing the Pinecone vector index
09:04 Getting dataset from Hugging Face to embed and index
10:03 Populating vector index with embeddings
12:01 Semantic search querying
15:09 Deleting the environment
15:23 Final notes

Рекомендации по теме

Комментарии

If you see "401: API Key is invalid" when initializing the Pinecone index switch your environment from "us-west1-gcp" to "us-east1-gcp"

The reason for this is that as of 23 Jan 2022 the default environment in Pinecone changed from us-west1-gcp to us-east1-gcp, so newly initialized default projects (for new users) will be using the new default env

jamesbriggs

This one video is more useful than 99% of the ChatGPT videos created by influencers these days

ChocolateMilkCultLeader

I really appreciate you taking the time to draw the diagram to explain the process. Pictures do imbed themselves in our minds and for me, it really helped me understand much better. Thank you.

carlhlazo

I have to say you are my go-to person when it comes to NPL stuff. I love your work man. Please don't stop.

MightyMoud

Wow. I've watched maybe 7 other videos about Embedding model and example use cases before your video... this here is by far the best. Well explained, walked away with a way better understanding of Embedding than the previous 7. Will definitely come back to your video for reference. THANK YOU!!!!

Legotron

Searching google for this exact topic led me here and to my surprise this video was just released today!

beecee

I just arrived in this channel and I can't believe how good it is! Keep going, my friend. I already can't wait to see the new videos coming 🤩

zevictor

embedding should give this video highest score realtive to this topic

slayermm

The amount value bombs you dropped in this video is insane. Thanks for sharing this video.

TheRonellCross

Thinking of adding this stuff to my experimental chatbot based on flan-t5 model. Continously store dialogs in a vector database and hopefully it will forever remember various facts you told it, which is not possible with the vanilla model because of context length.

constantinegeist

100% subscribed after finishing the video wow you’re tops

UserCommenter

If I have several mailing address with Name of the person. Can it be used for matching & deduplication purpose? Any suggestion?

dbiswas

I would be highly interested to see the performance of this model against other open source models to see if its worth justifying the price. Perhaps a good idea for your next video. How to evaluate language models. Do some kind of comparison against other known SOTA models.

kylespindler

Great vid. Remaining questions I have:

1) How do you determine the optimal text input length/type? e.g. when splitting up text content/text data at 6:06

2) If you make an embedding db for one model, is it [ever/under what conditions is it] transferrable to another model? ada 002-> ada 003?

iclick

Hi great video! I learned interesting and cool stuff. Thank you!

viemingtan

Nice demo! When the search results ranking is not what we want, how can we feedback to the model to improve the search ranking results?

chenpaul

Great video. I was able to prototype a small project.

mrchongnoi

everytime I run input"], engine=MODEL)` i get different vector respresentation. Why? Isn't it supposed to be deterministic?

veliea

Is there a self hosting solution to this indexing process? I don’t want to send data to openai to do the embedding. Can I have pinecone running locally?

lionardo

Why would you pass multiple strings in each embedding creation this will return one output? What is the meaning of this please I am trying to understand storage practices for pinecone and this has confused the process a little bit for me.

mitchellstewart

OpenAI's New GPT 3.5 Embedding Model for Semantic Search

OpenAI's New GPT 3.5 Embedding Model for Semantic Search

OpenAI Embeddings Explained in 5 Minutes

Getting Started with Azure OpenAI and GPT Models in 6-ish Minutes

OpenAI's Text-Embedding-3 in 7 Minutes

Retrieval Augmented Generation with OpenAI/GPT and Chroma

Understanding ChatGPT/OpenAI Tokens

How to Access OpenAI, ChatGPT, GPT - 4, GPT - 3.5 Models for Free | OpenAI API in Python

Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

How to Fix OpenAI API Request Error - Exceeding Current Quota

GPT 4, GPT 35 and Embedding Model Overview | Azure OpenAI Tutorial

You exceeded your current quota, please check your plan and billing details error SOLVED OpenAI API

Langchain ChatGPT your documents challenge PART 2 with Gpt 3 and OpenAI embeddings and Streamlit UI

How to Fine-tune a ChatGPT 3.5 Turbo Model - Step by Step Guide

OpenAI Embeddings (and Controversy?!)

Build AI Chat Bot using ChatGPT API 👉 gpt-3.5-turbo 👈 #chatgpt #openai

Open AI Launch Fine-Tuning For GPT-3.5 API Users 👏

Intro to OpenAI GPT-4 Text- Embedding-3 (New-Update)

Fine-Tuning GPT-3.5 on Custom Dataset: A Step-by-Step Guide | Code

QA chat with a website, OpenAI embeddings tutorial: Use GPT 3 API and Openai ADA-2 Embeddings

OpenAI embeddings explained with code examples. how to use OpenAI Gpt3 ADA vectorized embeddings API

How ChatGPT Works Technically | ChatGPT Architecture

Langchain JS | How to Use GPT-3, GPT-4 to Reference your own Data | OpenAI Embeddings Intro

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Fine-Tune ChatGPT For Your Exact Use Case