Build Your Own RAG Using Unstructured, Llama3 via Groq, Qdrant & LangChain

preview_player
Показать описание
In this 5th video in the unstructured playlist, I will explain you how to create your own Retrieval Augmented Generation (RAG) bot using the following tech stack.
- LangChain as framework
- UnstructuredIO for data prep
- Fastembed for embedding
- Qdrant Cloud as vectorstore
- Llama3 via GroqInc

80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more. Unstructured effortlessly extracts and transforms complex data for use with every major vector database and LLM framework.

Link ⛓️‍💥

Code 👨🏻‍💻

------------------------------------------------------------------------------------------
Timestamps ⏰
00:00 Introduction
02:33 Setup
04:58 Preprocess PDF
10:42 Preprocess Markdown (Readme)
14:08 Load the document into the VectorDB
17:27 Now the RAG part
22:24 Qdrant Cloud and LangSmith
25:19 Conclusion

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🤝 Connect with me:

#unstructureddata ##unstructuredio #rag #langchain #llm #datasciencebasics
Рекомендации по теме
Комментарии
Автор

Would be great to have a video about such a RAG of HTML, like crawling, scraping, processing and chunking online documentations of multiple sources.
That is one of the most valuable application of RAG.

attilavass
Автор

Could you also make videos on RAG related to business context? Such as using SAP Vector Stores and a complete business scenario on how RAGs can be implemented in business applications? Thanks

retr_nik
Автор

Great tutorial, as always ! Could you please show how to extract schematics and fault tree diagrams from the native and scanned pdf documents?

That would be great addition to graduate from Naive RAG to Advanced RAG.

Thank you for sharing!

aipy
Автор

Thanks a lot you videos always are quite good

ricla
Автор

great work .. waiting for Chainlit implementation

tamilil-
Автор

Any thought on doing similar for DBRx LLM from databricks.

subedi
Автор

How can we do a multi modal rag which can take image/text/table into consideration all in one. I don’t want to use gpt 4v.

AgeNtX
Автор

Hi, could you convert complex PDF documents (with graphics and tables) into an easily readable text format, such as Markdown? The input file would be a PDF and the output file would be a text file (.txt).

ignaciopincheira
Автор

Please let me know how to change the port of Chainlit as 8000 is already used by my portainer operating other dockers.

wcwong