Multi-Vector Retriever for RAG on Tables + Texts Using LANGCHAIN & UNSTRUCTURED

preview_player
Показать описание
In this video, I will show you how to chat with pdf which contains text as well as tables. We will be using langchain, openai, ChromaDB and Unstructured.

Happy Learning 😎

👉🏼 Links:

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🔗 🎥 Other videos you might find helpful:

------------------------------------------------------------------------------------------
🤝 Connect with me:

#langchian #llm #rag #semistructuredrag #datasciencebasics
Рекомендации по теме
Комментарии
Автор

Superr video as usual & I hav recently read in langchain blog. Happy to see in action through your videos.

VenkatesanVenkat-fdhg
Автор

I tried using summary but I am getting TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given
----> 2 table_summaries = summarize_chain.batch(tables, {"max_concurrency": 5})
not sure why!!

astralawes
Автор

Hello Mate, great video as always. Please help me out with a question: What if I am building a SaaS product and the raw tables and text cannot be stored in Memory. Which alternatives there are? Can I for example using Pinecone, storing the raw text of paragraphs and tables in the metadata of the vector? what is the alternative?

andresmerchan
Автор

much appreciable content, could you pls upload video tutorial for ingest these text and table embeddings to Milvus vector store

hiteshsingh
Автор

What I´m really wondering. Aren´t you using all the metadata? You are saving the text only not the text element in the Memory Store.

tom_
Автор

table_summaries = summarize_chain.batch(tables, {"max_concurrency": 5}), this line is giving NotFoundError : 404 whenever I am truing to run. Please help.

arunimamukherjee
Автор

How can we extract text and relevant image which should display in response or atleast should give the link. Pls help.

AdarshMamidpelliwar
Автор

12:14 It's possible the data returned from the vector store before passing to the model was lacking.

s-guytech
Автор

How to add Metadata of the table for better retrieval.

amarparajuli
Автор

So what about when pdf contain images & graphs

shahnaz
Автор

Thank you so much for the video. Can you please explain the case where you have multiple PDF files with tables and the best way to do it?

salwamostafa
Автор

I tried the langserve template for this and it works well, HOWEVER, it seems langchain in general doesn't address the storage and search methods on the text side. this type of setup cannot work in production because all the ingestion and data serving is done on the fly. In production the ingestion and storage need to be done before hand with pesistence. The trick is here is how to store the text/table partition_pdf data as regular text inside a non-vector database with a matched search like FTS5 to get relevant documents which then can be used with the summarization and the vectordb to produce the result. the langchain examples, not just this one, always just load the entire context into memory which basically means there is no FT search, it's just a dump and with presumed data formats. Anyway rant complete, it's a good example of the concept but without more storage and search deatil on text/table side, it's only say 10% of the work to get functioning in real life.

stephenthumb
Автор

Thanks, are there any ready to use project like this?

stanTrX
Автор

thanks for the great video…do you have any tutorial on extracting tabular information from a scanned document stored as a pdf..please suggest ..many thanks in advance..

absar
Автор

Hi This is excellent video.
I was experimenting with the simple financial/accounting documents in pdf.
Response was not always accurate with the output like: 'The text does not provide information on the Market Value ....'.
Is it because of wrong chunking strategy or there is no such a way to interpret these kind of documents? Thanks

miroslavstimac
Автор

Thanks for sharing these videos. These are really helpful. I have one question though. How can I install poppler in windows system? There I am facing some challenges. I am getting the following error in Windows system: "Unable to get page count. Is poppler installed and in PATH?"

anumoy
Автор

I like to explore handling different formats and constructing RAG on top of that....

VenkatesanVenkat-fdhg
Автор

Is there an alternative to work with other API than openAI API ?

lalithkumarb-msxc
Автор

Thank you for your Videos.
I build Bots myself, most of the time i use Flowise and host it on Render.
It seems that now it is not working anymore? have you any Info about that?

Arnold-Oberleiter
Автор

Excellent video👍 does it work also on windows in regards with tesseract?

henkhbit