Multi-Vector Retriever for RAG on Tables + Texts Using LANGCHAIN & UNSTRUCTURED

Показать описание

In this video, I will show you how to chat with pdf which contains text as well as tables. We will be using langchain, openai, ChromaDB and Unstructured.

Happy Learning 😎

👉🏼 Links:

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🔗 🎥 Other videos you might find helpful:

------------------------------------------------------------------------------------------
🤝 Connect with me:

#langchian #llm #rag #semistructuredrag #datasciencebasics

Рекомендации по теме

Комментарии

Superr video as usual & I hav recently read in langchain blog. Happy to see in action through your videos.

VenkatesanVenkat-fdhg

I tried using summary but I am getting TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given
----> 2 table_summaries = summarize_chain.batch(tables, {"max_concurrency": 5})
not sure why!!

astralawes

Hello Mate, great video as always. Please help me out with a question: What if I am building a SaaS product and the raw tables and text cannot be stored in Memory. Which alternatives there are? Can I for example using Pinecone, storing the raw text of paragraphs and tables in the metadata of the vector? what is the alternative?

andresmerchan

much appreciable content, could you pls upload video tutorial for ingest these text and table embeddings to Milvus vector store

hiteshsingh

What I´m really wondering. Aren´t you using all the metadata? You are saving the text only not the text element in the Memory Store.

tom_

table_summaries = summarize_chain.batch(tables, {"max_concurrency": 5}), this line is giving NotFoundError : 404 whenever I am truing to run. Please help.

arunimamukherjee

How can we extract text and relevant image which should display in response or atleast should give the link. Pls help.

AdarshMamidpelliwar

12:14 It's possible the data returned from the vector store before passing to the model was lacking.

s-guytech

How to add Metadata of the table for better retrieval.

amarparajuli

So what about when pdf contain images & graphs

shahnaz

Thank you so much for the video. Can you please explain the case where you have multiple PDF files with tables and the best way to do it?

salwamostafa

I tried the langserve template for this and it works well, HOWEVER, it seems langchain in general doesn't address the storage and search methods on the text side. this type of setup cannot work in production because all the ingestion and data serving is done on the fly. In production the ingestion and storage need to be done before hand with pesistence. The trick is here is how to store the text/table partition_pdf data as regular text inside a non-vector database with a matched search like FTS5 to get relevant documents which then can be used with the summarization and the vectordb to produce the result. the langchain examples, not just this one, always just load the entire context into memory which basically means there is no FT search, it's just a dump and with presumed data formats. Anyway rant complete, it's a good example of the concept but without more storage and search deatil on text/table side, it's only say 10% of the work to get functioning in real life.

stephenthumb

Thanks, are there any ready to use project like this?

stanTrX

thanks for the great video…do you have any tutorial on extracting tabular information from a scanned document stored as a pdf..please suggest ..many thanks in advance..

absar

Hi This is excellent video.
I was experimenting with the simple financial/accounting documents in pdf.
Response was not always accurate with the output like: 'The text does not provide information on the Market Value ....'.
Is it because of wrong chunking strategy or there is no such a way to interpret these kind of documents? Thanks

miroslavstimac

Thanks for sharing these videos. These are really helpful. I have one question though. How can I install poppler in windows system? There I am facing some challenges. I am getting the following error in Windows system: "Unable to get page count. Is poppler installed and in PATH?"

anumoy

I like to explore handling different formats and constructing RAG on top of that....

VenkatesanVenkat-fdhg

Is there an alternative to work with other API than openAI API ?

lalithkumarb-msxc

Thank you for your Videos.
I build Bots myself, most of the time i use Flowise and host it on Render.
It seems that now it is not working anymore? have you any Info about that?

Arnold-Oberleiter

Excellent video👍 does it work also on windows in regards with tesseract?

henkhbit

Multi-Vector Retriever for RAG on Tables + Texts Using LANGCHAIN & UNSTRUCTURED

Multi-Vector Retriever for RAG on Tables + Texts Using LANGCHAIN & UNSTRUCTURED

LangChain Multi-Query Retriever for RAG

Add Chunking to MultiVector for Chatting With Your Data

RAG from scratch: Part 12 (Multi-Representation Indexing)

Multimodal RAG with GPT-4-Vision and LangChain | Retrieval with Images, Tables and Text

LangChain - Advanced RAG Techniques for better Retrieval Performance

LangChain Retrieval QA Over Multiple Files with ChromaDB

Realtime Multimodal RAG Usecase Part 3 | MultiVectorRetriever with Langchain | RAG Application #rag

Create Retrieval-Augmented Generation RAG application in Python From Scratch Ollama Llama LangChain

Multi-modal RAG With LANGCHAIN 🦜🔗 & GPT-4V

Better RAG with MultiIndexRetriever : Retrieve full documents

6-Building Advanced RAG Q&A Project With Multiple Data Sources With Langchain

Advanced RAG 01 - Self Querying Retrieval

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Building Production-Ready RAG Applications: Jerry Liu

Hybrid Search RAG With Langchain And Pinecone Vector DB

What is Retrieval-Augmented Generation (RAG)?

How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's

Using Dataiku for Retrieval Augmented Generation (RAG)

Building Multi-Modal Search with Vector Databases

Advanced RAG 03 - Hybrid Search BM25 & Ensembles

😲 Building Advanced RAG systems #ai

Semi-structured RAG with LangChain and OpenAI GPT-4 RAG on tabular data , semi structured documents