Text Embeddings, Classification, and Semantic Search (w/ Python Code)

preview_player
Показать описание

In this video, I introduce text embeddings and describe how we can use them for 2 simple yet high-value use cases: text classification and semantic search.

More Resources:

[2] R. Patil, S. Boit, V. Gudivada and J. Nandigam, “A Survey of Text Representation and Embedding Techniques in NLP,” in IEEE Access, vol. 11, pp. 36120–36146, 2023, doi: 10.1109/ACCESS.2023.3266377.

--

Socials

The Data Entrepreneurs

Support ❤️

Intro - 0:00
Problem: Text isn't computable - 0:42
Text Embeddings - 1:42
Why should I care? - 3:15
Use Case 1: Text Classification - 5:49
Use Case 2: Semantic Search - 12:40
Free gift for watching: 23:50
Рекомендации по теме
Комментарии
Автор


--
References
[2] R. Patil, S. Boit, V. Gudivada and J. Nandigam, “A Survey of Text Representation and Embedding Techniques in NLP, ” in IEEE Access, vol. 11, pp. 36120–36146, 2023, doi: 10.1109/ACCESS.2023.3266377.

ShawhinTalebi
Автор

Love that you’re bringing real knowledge, insights and code here! So many AI YouTubers are just clickbaiting their way through the hype cycle by reading the same SHOCKING news as everyone else.

ccapp
Автор

I have learnt so much by watching the entire series. Thank you so much Shaw! I think this is one of the best playlists out there for anyone looking to get into the field of LLMs and GenAI.

krishnavamsiyerrapatruni
Автор

Clear and understandable explanation of these concepts. Thanks and really enjoyed!

obaydmir
Автор

SEO here, enjoyed your examples of semantic search and explanation of hybrid search. Great vid and easy to follow. Will explore your channel. Cheers!

ethanlazuk
Автор

Congrats man! Keep going with more real examples with code sharing

PRColacino
Автор

Great video. The practical use cases for embeddings themselves are undervalued IMHO and this video is fantastic for showing ways to use embeddings. Even if you use OpenAI embeddings, they are dirt cheap, and can provide fantastic vectors for further analysis, manipulation, and comparison.

BrandonFoltz
Автор

Wow! Thank you for breaking this down, been trying to figure it out!

ifycadeau
Автор

You are the real guy to subscribe and learn

pramodkumarsola
Автор

I discovered this concept and it is so useful

superresistant
Автор

Thank you very good information, will try to make a database for audio sound effects using vector databases text to audio

cinematicsounds
Автор

Iv been using embeddings for awhile but i find that agents can call specialized tools that can be very useful depending on the applications.

eliskucevic
Автор

I have watched most of the videos in this series and found them really helpful. Something I am looking for that I haven't seen you cover yet. Is some more guidance on preparing data for either RAG or fine tuning. I am sure you have practical tips you can give. I have a large old codebase, we have loads of documentation and tutorials etc, but it is a lot of someone to pickup. This new world of GPTs seams perfect for building an assistant. I will be able to work through it ok, but I suspect there will be a load of learnt best practices or pitfalls to avoid that are a bit more subtle. For example I am looking through our support emails / tickets, lots of them all start with please send logs :) and after a load of back and forth we have info. This is much like a conversation with ChatGPT. For fine tuning is it best to fine tune on a whole thread? Or each chunk of the conversation?

KrisTC
Автор

Hey Shaw thanks for this wonderful series. I have completed it and learned so many new things but one thing I felt is that the code is very high level and it feels like to me that I have to remember most of the things during coding while practicing with those hugging face models. Do you have any suggestions for that?

uzairmalik
Автор

Many thanks for the video Shaw, great content!
One simple question: when using OpenAI's embedding model, each resume is represented by an embedding vector. Is this embedding computed as the average of all word vectors?

pepeballesteros
Автор

LDA - Latent Dirichlet Allocation is kinda trivial these days… Matlab text analytics toolbox works great on pdf’s with bi-grams… a la bag-of-N-Grams. Cool… thanks…

Whysicist
Автор

it's possible to extract software names from the query with a text classifier and apply only e. g. apache airflow to kw search? also what db do you suggest? is postgres with vector db good?

sherpya
Автор

finally someone who speaks with their hands more than I do, lol...

avi
Автор

Hi Shawhin, Thanks. I ran into a problem. I tried to use Sentence_transformers model by installing it. It always givens an error no file found in the .cache/huggingface/... folder. Your help is appreciated

tamilinfomite
Автор

Can only two kinds of classification be made? If I have lots of types, for example, product classification, can it be applied?

AlexandreMarr-uqpw