Semi-structured RAG - LangChain using Mistral 7B , Qdrant FastEmbed on pdf text with tabular data

preview_player
Показать описание

Many documents contain a mixture of content types, including text and tables.
Semi-structured data can be challenging for conventional RAG for at least two reasons:
• Text splitting may break up tables, corrupting the data in retrieval
• Embedding tables may pose challenges for semantic similarity search
This video shows how to perform RAG on documents with semi-structured data:
• We will use Unstructured to parse both text and tables from documents (PDFs).
• We will use the multi-vector retriever to store raw tables, text along with table summaries better suited for retrieval.
• We will use LCEL to implement the chains used.
We will use Mistral 7B Instruct as our LLM and use Qdrant FastEmbed for our embedding
Colab notebook:

If you like such content please subscribe to the channel here:
Рекомендации по теме
Комментарии
Автор

hi sir, can i do this same in amazon sagemaker, or in amazon bedrcok

techthunder
Автор

Can you go in detail how extracted text and table looks like? especially table after extracting and before making summaries of table.

Thanks

sagarchadha
Автор

Sir, Is this done on paid colab? How can I do this in unpaid colab with cpu? Is it even possible?

rnronie
Автор

Table, Text Can we add images data too here?

devanshgupta