True Multimodal RAG - Audio/Image/Video/Text

preview_player
Показать описание
Everyone knows general text based vector databases, and text based RAG for LLM applications, but as it turns out thats just the beginning! Taking advantage of CLIP & CLAP models along with some fancy tricks, we embed 25,000 text entries, 1999 pictures, 2000 audio files, and 99 videos into a single vector database, allowing us to run direct text to text/audio/image/video retrieval!

Resources:
Multimodal Image RAG Video:

Chapters:
00:00 - Intro
01:04 - CLIP Model Review
02:08 - CLAP Model Overview
02:35 - Modality 1: Audio Setup & Dataset
03:45 - Modality 1: Custom Audio Embedding & Loader Functions
05:40 - Modality 1: Audio Embedding & Testing Retrieval
07:38 - Modality 2: Image Setup & Dataset
08:52 - Modality 2: Image Embedding & Testing Retrieval
09:46 - Modality 3: Text Setup & Dataset
10:24 - Modality 3: Text Embedding
12:22 - Modality 3: Testing Text Retrieval
13:06 - Modality 4: Video Setup & Methodology
15:06 - Modality 4: Video Dataset & Embedding
16:22 - Modality 4: Testing Video Retrieval
17:10 - Full Multimodal Retrieval!
18:34 - RAG: Setup
19:26 - RAG: Prompt Setup
20:25 - RAG: Full Multimodal Retrieval Augmented Generation
21:15 - Outro

#ai #coding #generativeai
Рекомендации по теме
Комментарии
Автор

This is really nice content! Keep up the good work brother ✨✨

anasaberchih
Автор

Very interesting for the text, audio and images, for the video I think I will try to make something a bit more convoluted by identifying the bigger frame changes (where the distance between frame n and frame n+1 is greater) to try to identify key points in the video. maybe also I will try to rely on audio for the video part

jean-baptistedelabroise
Автор

Thanks for the video. Wanna make a similar vid with ColPali?

sandorkonya
Автор

Wow.... just wow. Native audio RAG!!! Keep up the amazing vids.

IdPreferNot
Автор

ever listen to the first 30 seconds and go, this is perfect... for when i know what tf is up lolol, tysm saving for later!

scartheredd
Автор

Love this, keep it up man. I really appreciate you mainly using open source solution.

Kalebryee
Автор

awesome video this is amazing content!

Elisechingstah
Автор

Really good content!! BTW- Gemini 1.5 can process audio

nivimg