Build Your Own Local PDF RAG Chatbot (Tutorial)

Показать описание

In this tutorial, we'll explore how to create a local RAG (Retrieval Augmented Generation) pipeline that processes and allows you to chat with your PDF file(s) using Ollama and LangChain. We will also create a Streamlit app for the UI.

✅ We'll start by loading a PDF file using the "UnstructuredPDFLoader"
✅ Then, we'll split the loaded PDF data into chunks using the "RecursiveCharacterTextSplitter"
✅ Create embeddings of the chunks using "OllamaEmbeddings"
✅ We'll then use the "from_documents" method of "Chroma" to create a new vector database, passing in the updated chunks and Ollama embeddings

The model will retrieve relevant context from the updated vector database, generate an answer based on the context and question, and return the parsed output.

TIMESTAMPS:
============
00:00:00 - Introduction
00:00:38 - Reference to previous PDF RAG tutorial
00:01:08 - Project directory structure
00:03:00 - Import required libraries
00:05:09 - PDF content overview
00:06:07 - Text chunking and overlap technique
00:07:43 - Create vector embeddings and load to vector database
00:09:01 - Build a retriever
00:21:01 - Streamlit app overview
00:27:01 - Conclusion and outro

LINKS:
=====

Follow me on socials:

Join this channel to get access to perks:

#ollama #langchain #streamlit #vectordatabase #pdf #nlp #machinelearning #ai #llm #RAG #retrievalaugmentedgeneration

Рекомендации по теме

Комментарии

Your content is amazing! ⭐️⭐️⭐️⭐️⭐️ Thank you for all the effort you put into it—I’m so grateful I found your channel. You’ve earned my sub, and I can’t wait to see more from you!

uchihaerenyeager

Saw this posted on Reddit today, hopped on my laptop right away. Very detailed, yet simply explained. Just picked up a new subscriber, thanks.

watchthemanual

thank you so much! had an assignment to learn how to create a rag chatbot w multiple pdfs as the data source and i came across your channel while researching. the previous tutorials you made were already helpful but i saw you were going to make an updated video and i was super excited. this was great, subscribed to you for more content in the future too. 🚀

ivoryontrack

Thank you so much for you content. You are help me a lot. Hugs from Brazil!

alexsandrotabosa

Great tutorial ❤, we're looking forward for you to make tutoriels on langgraph for agentic workflows with chainlit as frontend 🎉🎉

free_thinker

Yeah, newer packages versions are always source of problems for me. Especially when I forget to run pip freeze > requirements.txt, to save specific versions.

dawmro

Thank you again for this update tutorial! It is really helpful. I have a question
what python version you used for this updated code?

MarahTal

Getting error... Cannot hash argument 'models_info' (of type ollama._types.ListResponse) in 'extract_model_names'." running on a Windows WSL environment. Everything installed but then when I open the Web Interface it gives me the model's error

doughimes

Thank you very much for your content and efforts. Assume the following scenario: let's say a document describes some criteria in specific paragraphs and a second document describes a project proposal. I want to check how well the project proposal addresses the criteria as set in the first document. Would something like that be a feasible use-case and what would it take to implement it?

MinoasPediadas

When working with Streamlit it showing module_info name error and on cells it showing error of DLL load fails how to fix it

surbhi.emergingtech

HI Tony I found your tutorial about Hot to chat with pdf files which reduce your time for process information. I have one question. I went through whole process in tutorial step by step. And I cloned your repository. When I run streamlit_app.py file to locally deploy computer can not see ollama models. But I dowloaded it to my computer. Can you explain me this case. Thank you in advance for your response.

ШохрухАбдивоитов

Getting error while running it as below.
"DLL load failed while importing onnx_copy2py_export: a dynamic link Libra (DLL) initialization routine failed.

Suggest on above.

Please suggest.

JumpingStar-tv

The error i am encountering is specifically related input of chain.invoke() method in LangChain.
Expected JSON/Dict Input but it seems an empty string ('') and this mismatch triggers a ValidationError in Pydantic.

khurramumair

i am not be able to get rid of that error
"Error: failed to find libmagic. Check your installation"
please somenone have any idea of about it. i am using a ollama on cpu based laptop

muhammadsawaiz

idk why but i got soo many errors on data = loader.load(). can you please help me..?

yashhurkadli

i am getting the dll error in onnx, i reinstalled it, installed the x86 and x 64 c++ redistributable but so far nothing helped. i am running widows 11

karansingh-ceyy

Has anyone her encountered the error when chatting with PDFs? I get this nonetype object is not iterable error, even though I've already listed all of my ollama models and installed ollama on my project.

armandf.s

Windows based install

Getting error while running it as below.
"DLL load failed while importing onnx_copy2py_export: a dynamic link Libra (DLL) initialization routine failed.

Im also having this same issue, additionally tried the steps given on your github to rectify this. They didnt work

even tried rolling back onnx to both 1.16.1 and 1.16.0 (1.15.0 doesnt install)

i need help, this project seems very interesting and I want to implement this.

AkshayKumar-qcrz

Spent hours resolving multiple dependencies error, installing multiple softwares, and following exact steps from your github.
Still can't get it working. Using Windows with GPU. Tried re-installing onnxruntime-gpu.
Visual C++ Redistributable is already installed.
Please help.

Error:
ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.

ammaransari

What version of python you're using?

zandanshah

Build Your Own Local PDF RAG Chatbot (Tutorial)

Build Your Own Local PDF RAG Chatbot (Tutorial)

How to build a Local AI like ChatGPT using Deepseek R1 and Open WebUI (macOS/Linux Version)

Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)

PrivateGPT 2.0 - FULLY LOCAL Chat With Docs (PDF, TXT, HTML, PPTX, DOCX, and more)

Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

Create your own Local Chatgpt for FREE, Full Guide: PDF, Image, & Audiochat (Langchain, Streamli...

Build Local ChatGPT with your OWN document | Chat with PDF, TXT and CSV File | Quick Setup Guide

How To Install PrivateGPT - Chat With PDF, TXT, and CSV Files Privately! (Quick Setup Guide)

I Spent 30 Days Mastering Social Listening and PROFITED (Here's How You Can Too)

How To Build Your Own AI Chat with Docs (PDF, TXT, HTML, PPTX, DOCX, and More)!

Fully Local RAG for Your PDF Docs (Private ChatGPT Tutorial with LangChain, Ollama, Chroma)

Use RAG to chat with PDFs using Deepseek, Langchain and Streamlit

Search Your PDF App using Langchain, ChromaDB, and Open Source LLM: No OpenAI API (Runs on CPU)

AnythingLLM Cloud: Fully LOCAL Chat With Docs (PDF, TXT, HTML, PPTX, DOCX, and more)

ChatGPT for YOUR OWN PDF files with LangChain

Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama

100% Private & Local PDF ChatBot (without langchain)

Build a Private Chat My PDF Data RAG System with LangChain, Ollama, FAISS Vector Store & Llama 3...

Langchain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES | Step-by-Step Tutorial

Chat with your PDF Chatbot: All OPEN SOURCE (Runs on CPU)

NEW DeepSeek-R1 Computer Use AI Agents are INSANE (FREE!) 🤯

Llama 3 RAG: Create Chat with PDF App using PhiData, Here is how..

How to create a PDF knowledge base for your CUSTOM GPT app (and have your users interact with it)

Vector databases are so hot right now. WTF are they?