Build Your Own Local PDF RAG Chatbot (Tutorial)

preview_player
Показать описание
In this tutorial, we'll explore how to create a local RAG (Retrieval Augmented Generation) pipeline that processes and allows you to chat with your PDF file(s) using Ollama and LangChain. We will also create a Streamlit app for the UI.

✅ We'll start by loading a PDF file using the "UnstructuredPDFLoader"
✅ Then, we'll split the loaded PDF data into chunks using the "RecursiveCharacterTextSplitter"
✅ Create embeddings of the chunks using "OllamaEmbeddings"
✅ We'll then use the "from_documents" method of "Chroma" to create a new vector database, passing in the updated chunks and Ollama embeddings

The model will retrieve relevant context from the updated vector database, generate an answer based on the context and question, and return the parsed output.

TIMESTAMPS:
============
00:00:00 - Introduction
00:00:38 - Reference to previous PDF RAG tutorial
00:01:08 - Project directory structure
00:03:00 - Import required libraries
00:05:09 - PDF content overview
00:06:07 - Text chunking and overlap technique
00:07:43 - Create vector embeddings and load to vector database
00:09:01 - Build a retriever
00:21:01 - Streamlit app overview
00:27:01 - Conclusion and outro

LINKS:
=====

Follow me on socials:

Join this channel to get access to perks:

#ollama #langchain #streamlit #vectordatabase #pdf #nlp #machinelearning #ai #llm #RAG #retrievalaugmentedgeneration
Рекомендации по теме
Комментарии
Автор

Your content is amazing! ⭐️⭐️⭐️⭐️⭐️ Thank you for all the effort you put into it—I’m so grateful I found your channel. You’ve earned my sub, and I can’t wait to see more from you!

uchihaerenyeager
Автор

Saw this posted on Reddit today, hopped on my laptop right away. Very detailed, yet simply explained. Just picked up a new subscriber, thanks.

watchthemanual
Автор

thank you so much! had an assignment to learn how to create a rag chatbot w multiple pdfs as the data source and i came across your channel while researching. the previous tutorials you made were already helpful but i saw you were going to make an updated video and i was super excited. this was great, subscribed to you for more content in the future too. 🚀

ivoryontrack
Автор

Thank you so much for you content. You are help me a lot. Hugs from Brazil!

alexsandrotabosa
Автор

Great tutorial ❤, we're looking forward for you to make tutoriels on langgraph for agentic workflows with chainlit as frontend 🎉🎉

free_thinker
Автор

Yeah, newer packages versions are always source of problems for me. Especially when I forget to run pip freeze > requirements.txt, to save specific versions.

dawmro
Автор

Thank you again for this update tutorial! It is really helpful. I have a question
what python version you used for this updated code?

MarahTal
Автор

Getting error... Cannot hash argument 'models_info' (of type ollama._types.ListResponse) in 'extract_model_names'." running on a Windows WSL environment. Everything installed but then when I open the Web Interface it gives me the model's error

doughimes
Автор

Thank you very much for your content and efforts. Assume the following scenario: let's say a document describes some criteria in specific paragraphs and a second document describes a project proposal. I want to check how well the project proposal addresses the criteria as set in the first document. Would something like that be a feasible use-case and what would it take to implement it?

MinoasPediadas
Автор

When working with Streamlit it showing module_info name error and on cells it showing error of DLL load fails how to fix it

surbhi.emergingtech
Автор

HI Tony I found your tutorial about Hot to chat with pdf files which reduce your time for process information. I have one question. I went through whole process in tutorial step by step. And I cloned your repository. When I run streamlit_app.py file to locally deploy computer can not see ollama models. But I dowloaded it to my computer. Can you explain me this case. Thank you in advance for your response.

ШохрухАбдивоитов
Автор

Getting error while running it as below.
"DLL load failed while importing onnx_copy2py_export: a dynamic link Libra (DLL) initialization routine failed.

Suggest on above.

Please suggest.

JumpingStar-tv
Автор

The error i am encountering is specifically related input of chain.invoke() method in LangChain.
Expected JSON/Dict Input but it seems an empty string ('') and this mismatch triggers a ValidationError in Pydantic.

khurramumair
Автор

i am not be able to get rid of that error
"Error: failed to find libmagic. Check your installation"
please somenone have any idea of about it. i am using a ollama on cpu based laptop

muhammadsawaiz
Автор

idk why but i got soo many errors on data = loader.load(). can you please help me..?

yashhurkadli
Автор

i am getting the dll error in onnx, i reinstalled it, installed the x86 and x 64 c++ redistributable but so far nothing helped. i am running widows 11

karansingh-ceyy
Автор

Has anyone her encountered the error when chatting with PDFs? I get this nonetype object is not iterable error, even though I've already listed all of my ollama models and installed ollama on my project.

armandf.s
Автор

Windows based install

Getting error while running it as below.
"DLL load failed while importing onnx_copy2py_export: a dynamic link Libra (DLL) initialization routine failed.

Im also having this same issue, additionally tried the steps given on your github to rectify this. They didnt work

even tried rolling back onnx to both 1.16.1 and 1.16.0 (1.15.0 doesnt install)

i need help, this project seems very interesting and I want to implement this.

AkshayKumar-qcrz
Автор

Spent hours resolving multiple dependencies error, installing multiple softwares, and following exact steps from your github.
Still can't get it working. Using Windows with GPU. Tried re-installing onnxruntime-gpu.
Visual C++ Redistributable is already installed.
Please help.

Error:
ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.

ammaransari
Автор

What version of python you're using?

zandanshah