Cohere vs. OpenAI embeddings — multilingual search

Показать описание

In this video, we're going to work through a multilingual semantic search example using Cohere's new multilingual model. I also expect many of you will be curious about how it stacks up against OpenAI's GPT 3.5 text-embedding-ada-002 model, so we cover that too.

Big thanks to @NilsReimersTalks — he basically wrote all of the code here and explained a ton of things to me. You should go look at his channel. He has a ton of useful content on semantic search.

📌 Notebook:

🎙️ AI Dev Studio:

👾 Discord:

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Subscribe for Article and Video Updates!

00:00 What are Cohere embeddings
00:46 Cohere v OpenAI on cost
04:37 Cohere v OpenAI on performance
06:37 Implementing Cohere multilingual model
07:55 Data prep and embedding
10:45 Creating a vector index with Pinecone
14:07 Embedding and indexing everything
17:24 Making multilingual queries
21:55 Final throughts on Cohere and OpenAI

Рекомендации по теме

Комментарии

Hey James, thanks again for the great video. I'm interested in the "on-prem" co:here solution via AWS. Can you provide a link to somewhere I can read more about that (i.e. wherever the table you showed came from). Having trouble finding it myself.

Truizify

Hi James, Thanks for valuable share. How you are listed that cohere AI (safety on sensitive data) is high compared to open ai @ 3:35. Can you provide input on this because I think both uses endpoints for our requests in similar way....

venkatesanr

How’s that Arxiv bot coming along James? I noticed it in your pinecone index list and dimensions seem to be 1536 which is Ada-002’s vector length🤔😉 great presentation yesterday and love that openai has some competition in the space! Think us builders will all benefit from the competition and the multilingual support is a game changer IMHO! Keep up the great work!🥳💪🏼😎

klammer

Nice presentation! Some remarks about the table present at 03:09 : the OpenAI's text-embeddings-ada-002 does support multilanguage. It behaves very close of laBSE (language agnostic Bert Sentence Encoder) model. And co:here can be more expensive, as you must perform an "embedding" each time you make a query.

Lemure_Noah

pretty useful info. especially the token size, typical embedding size etc.

cloudshoring

Also ada-002 can be applied to text up do 8191 tokens against cohere's 512 tokens - as some others sentence encoders models. Of course, 8191 tokens is a lot of text, and maybe we should use a more fine grained text chunks, as 4096 or even less. But this is something to take into account.

Lemure_Noah

What model or endpoint are you using to get 768 parameter vectors from cohere. Medium gives me 2048 and large gives me 4096?

Aquaritek

Do you know if there's any new free solution for multilingual embeddings or is it still the

jkezlnb

Do I understand correctly: if we used the ada-002 model to index the knowledge base, should we also use ada-002 to search for similar ones?
When a new model appears, for example ada-003, and we want to use it, will we need to reindex the knowledge base with ada-003?

dtaylor

Hey James, I actually ran into an issue with cohere embeddings. So they've revised the max token length for an embedding, its 512 now. They recommend to truncate the text to fit into this. This was I think this month itself. Maybe last week or so. The qyality is still pretty good. But they advise to use the truncation parameter.

averma

Wanted your thoughts on AI-Powered Programming Languages.
Can you make a video on it, i think its sound super interesting. Would like another person's perspective
Great video like always 🙌🙌👍👍👍

minfuel

Can you link the sources? I'd like to look into this in more detail

ChocolateMilkCultLeader

What happened to cohere's extension for visual code studio ? Did they delete it ?

PizzaLord

Thanks for the comparison👍 is it more for a search than for chatbot use case?
Can u make a video of meta new model llama🤔😉

henkhbit

Any details about how is the multilingual model trained ?

soumyasarkar

James, can you better lay out for us why this works? It seems like using a NLP (Natrual linguistic professional) as your front end, makes it the only part that needs to know the language? Does this have to do with the vectorization process? I think I am getting confused between the terms LLM, NLP, Databases, and might just be trying to reinvent the wheel. Does this makes sense?

lutune

Hey man I think you leaked the cohere key I would change it ASAP

thomasmeta

Cohere vs. OpenAI embeddings — multilingual search

Cohere vs. OpenAI embeddings — multilingual search

OpenAI Alternatives: Cohere Embed v3 and Open Source

What are text embeddings? #cohere #embedding

Giving computers many human languages with Cohere's multilingual embeddings

Vector Search and Embeddings in Organizations: Harnessing Language Models and Vector Indexing

What is Semantic Search?

What Are Word and Sentence Embeddings?

Cohere AI's LLM for Semantic Search in Python

Cohere's Wikipedia Embeddings: A Short Primer on Embedding Models and Semantic Search

The steam engine of today: How AI is revolutionizing the working world @ Collision 2023

Cohere and Oracle Partnership Brings Generative AI Solutions to Customers | Oracle Cloud World 2023

Using Cohere, AI21, and OpenAI generative models with Clarifai

RAG But Better: Rerankers with Cohere AI

OpenAI Embeddings and Vector Databases Crash Course

Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers

MULTILINGUAL CHATBOT 🤖 USING @CohereAI embedding models, @databutton and @LangChain

Cohere's Command-R a Strong New Model for RAG

What is Retrieval-Augmented Generation (RAG)?

Vector Databases simply explained! (Embeddings & Indexes)

Cohere LLM - Free alternative to OpenAI's ChatGPT

Build LLM-Powered Apps with Cohere and Langflow

New course with Cohere: Large Language Models with Semantic Search

What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation

The workforce renaissance: Accelerating the next wave of productivity with LLMs @ Collision 2023