Working with MULTIPLE PDF Files in LangChain: ChatGPT for your Data

preview_player
Показать описание
Welcome to this tutorial video where we'll discuss the process of loading multiple PDF files in LangChain for information retrieval using OpenAI models like ChatGPT. Our step-by-step guide will explain how to convert PDF files into embeddings based on the chosen large language model. Let's get started!
Welcome to this tutorial where you'll learn how to extract valuable information from your PDFs using LangChain and OpenAI Text Embeddings. We'll guide you step-by-step through the process of setting up LangChain to communicate with your PDF files, allowing you to retrieve information efficiently and effectively. By the end of this tutorial, you'll have the skills necessary to use advanced language processing technology and improve your data analysis.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

LINKS:

#LangChain #InformationRetrieval #PDF #OpenAITextEmbeddings #DataAnalysis #LanguageProcessingTechnology #AI #MachineLearning #NaturalLanguageProcessing #NLP #Tutorial
Рекомендации по теме
Комментарии
Автор

My man! First, you're a monster. Obviously, I bought your a coffee. Anyway, there were 3 erros/bugs (excuse my languague this is the first time I code something in my life); which in case somebody was struggling I think they are useful. 1) in the section Connect Google Drive; second segment of the code; I had to input between pdf_folder_path = f'{root_dir}/data/' and os.listdir(pdf_folder_path) the line import os. In other words, the full line(s) of code is (first line) pdf_folder_path = f'{root_dir}/data/' [enter] import os (second line) [enter] os.listdir(pdf_folder_path) (third line). 2. In the section 'Load Multiple PDF Files' I included these two lines of code from langchain.document_loaders import UnstructuredPDFLoader
from langchain.indexes import VectorstoreIndexCreator; 3) In Vector Store section as a first line of code I have included: !pip install And that's basically it! Cheers mate!

asprinama
Автор

Please consider doing a similar video on how to be able to chat more freely with Google Drive PDFS with memory. For example, having the script generate a glossary, an outline, or a lesson plan based on the database of pdfs.

sammiller
Автор

I found that I had to add this in order for it to work:

!pip install unstructured[local-inference]

Otherwise I got this error:
ImportError: Following dependencies are missing: pdfminer. Please install them using `pip install

Why is this?

ynboxlive
Автор

Can you also include how to interact with tables and pictures in a PDF document

arsalanriaz
Автор

I want to use alpaca or vacuna model instead of chatgpt because chatgpt has limitations on the requests we sent. I just wanted to use any open-source model instead of chatgpt is this possible?

nitingoswami
Автор

This is excellent. Would love for you to dwell deeper into this experimentation. How much did it cost you on OpenAI’s end? For embeddings etc.

LoneRanger.
Автор

Hi, very good work. Thanks! Sorry but the link of google colab is invalid

giovannigrassobbio
Автор

Can you choose which model to use? I don’t see a request completion with the model statement. Thank you for this video — I’m still learning by doing.

markanthonymarez
Автор

Does this method works with full books ~300 pages?

elgodric
Автор

Hi Prompt Engineering!
Quick question: I like the way you created an index from multiple PDF files and queried from the index. Have you attempted to persist the vectorstore for later use (e.g., query or update with additional documents)?

RonBarrett
Автор

One more question - do the documents need to be reloaded into a vector every single time? Or can we simply import the query and answer to another Python file?

Alex-Ibby
Автор

Thanks for the video, it's very useful. Is it possible to integrate a voice assistant that receives a question as input and answers via voice, using the information present in the pdfs? It would be very useful. It could be done by whisper or bark. What do you think about it?

matteodeamicis
Автор

can it answer questions that need information from multiple pdfs?

gsdeng
Автор

When I run the VectorstoreIndexCreator() cell i get the following error
ImportError: cannot import name 'open_filename' from 'pdfminer.utils'


I tried installing and importing the packages but that didn't work either, any solution to this?

cascaderz
Автор

Is it possible to retrieve which section of the PDF it is referring too? (even it can detect the portion of chunk in pdf)

tapos
Автор

You are amazing! This is exactly what I was looking for. I might also need to connect with you in future for consultancy on something that I am trying to build.

PallaviChauhan
Автор

Thanks a lot, but I have a error message when I run the VectorstoreIndexCreator() cell i get the following error: "ImportError: cannot import name 'open_filename' from 'pdfminer.utils' ¿could you help me?

samser
Автор

Thanks for excellent video. How to get the page number of the content & sources...Any suggestions

VenkatesanVenkat-fdhg
Автор

Thank you very much. Is there anyway we can specify which document to scan into to find the answers?

samdaniel
Автор

May I ask does it work with PDFs having over 4000 tokens (the limit of OpenAI API)? Thanks a lot for providing both guidelines and Colab notebook for immediate use!

cheunghenrik