Semi-structured RAG with LangChain and OpenAI GPT-4 RAG on tabular data , semi structured documents

preview_player
Показать описание

Many documents contain a mixture of content types, including text and tables.
Semi-structured data can be challenging for conventional RAG for at least two reasons:
• Text splitting may break up tables, corrupting the data in retrieval
• Embedding tables may pose challenges for semantic similarity search
This video shows how to perform RAG on documents with semi-structured data:
• We will use Unstructured to parse both text and tables from documents (PDFs).
• We will use the multi-vector retriever to store raw tables, text along with table summaries better suited for retrieval.
• We will use LCEL to implement the chains used.

If you like such content please subscribe to the channel here:
Рекомендации по теме
Комментарии
Автор

What would be the cost of this implementation, lets say i have a million pdfs how much would it cost me because langchain, unstructured, gpt-4 all 3 are getting used.

AmanShrivastava
Автор

is there a version of this on open source hugging face models instead of open AI ?

lalithkumarb-msxc
Автор

its throwing a error saying cannot identify image file '/tmp/ tmp3xh5ml2w/

kavururajesh