Chat with Multiple PDFs using Llama 2 and LangChain (Use Private LLM & Free Embeddings for QA)

preview_player
Показать описание

Can you build a chatbot that can answer questions from multiple PDFs? Can you do it with a private LLM? In this video, we'll use the latest Llama 2 13B GPTQ model to chat with multiple PDFs. We'll use the LangChain library to create a chain that can retrieve relevant documents and answer questions from them.

You'll learn how to load a GPTQ model using AutoGPTQ, convert a directory with PDFs to a vector store and create a chain using LangChain that works with text chunks from the vector store.

PDF files:

Join this channel to get access to the perks and support my work:

00:00 - Introduction
00:38 - Text Tutorial on MLExpert
01:11 - Earning Reports (PDF Files)
02:08 - Llama 2 GPTQ
03:59 - Google Colab Setup
06:00 - Prepare the Vector Database with Instructor Embeddings
08:45 - Create a Chain with Llama 2 13B GPTQ
14:36 - Chat with PDF Files
20:55 - Conclusion

#llm #langchain #chatbot #artificialintelligence #chatgpt #llama2 #gpt4 #promptengineering
Рекомендации по теме
Комментарии
Автор

Great video I loved it! The most underrated video it's been almost 1 month since I found a perfect video.

theyashsisodiya
Автор

Important: The filename of the model has been changed in the repository. Use:

model_name_or_path =
model_basename = "model"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model =
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)

venelin_valkov
Автор

Love your videos - so easy to follow along and your give the facts without sugar-coating! Thank you!

sanjaybhatikar
Автор

Целенасочено пиша на български: страхотна работа, клиповете ти са много информативни и полезни, градиш над наличните статии в интернет и имаш отличен подход в обясненията.

MrWeaselo
Автор

Thanks. I ran your example. SEC filings are in HTML or text format, with the text being a kind of XML. The moment I switched from PDF loader to the HTML loader from langchain-community, I landed in "dependency hell". Try removing the version numbers from the packages (which you have so kindly provided) and again numerous dependency conflicts arise. That makes it very difficult and cumbersome to use Langchain in practice. The sheer number of conflicting dependencies will be a challenge to use this in production if it continues. Thanks again.

sanjaybhatikar
Автор

@valkov, Thank you for the great tutorial.
When i tried, at step "Llama 2 13B",
Getting this error. Plz help to solve
kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
2. You are using pytorch without CUDA support.
3. CUDA and nvcc are not installed in your device.

Bcoz of this, Model performing very slow. It take CPU to run.

arutchelvana
Автор

what about if i want to have followup question to the llama-2? like performing interactive conversation and not just one time question.

jennilthiyam
Автор

My os is windows 10 and my system is cori5_8generation
and i dont have GPU can i use llama2 chat for 7B in my collab and work ? Or no work?

زينبسالمعزيز-دح
Автор

Is it possible to use this code (and ideally also the google colab gpu) to create a web page that contains a chatbot to which you can send questions and print the answers on the screen?

Lorenzo_T
Автор

what is the "-qqq" in pip install command for? Can anyone explain, I couldn't find an answer

ChandiniV-lf
Автор

Why do you think the choice of embeddings makes such a difference? I get that embeddings capture meaning and linguistic structure, so. the quality of embeddings should make some difference. But assuming all LLMs are trained on high volume of low quality data on the internet, the difference should not be substantial. Would appreciate your insight! Cheers.

sanjaybhatikar
Автор

there is a conflict between torch 2.0.1 and 2.1.0 when installing other dependencies, can someone help?

danmotoc
Автор

What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance

pantherg
Автор

Some of the tokenized texts from text_splitter could be bigger than Sequence_Lenght of selected embedding model (512 tokens)... and the difference is silently dropped by the embedding model. This result in no embeddings for dropped text. Hence the missing information. Could add a token counter function to check the issue.

Lucifer-lcrq
Автор

Is it possible to deploy Llama 2 with a custom knowledge base to production?

courses
Автор

Hello my friend. Won't you help me with the specialized model of the Persian language? I really need your help!

mohsenghafari
Автор

there is issue with dependencies and requirements, could you please solve it.

SoftwareDevelopmentiQuasar
Автор

Did they change the filename of the model in the repository again? If so, what shall be the correct code now?

arinspace
Автор

How can use llama models to translate a entire document?

gammingtoch
Автор

hi. please help me. how to create custom model from many pdfs in Persian language? tank you.

mohsenghafari