ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain

preview_player
Показать описание
ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain

In this video, I will show you, how you can chat with any document.  Let's say you have a folder and inside the folder, you have different file formats. Let's say you have PDF file. You have text file. You have read me file and others.  I will show you how you can take all of your data, split the data into different chunks, do the embeddings using the OpenAI embeddings, store that into the Pinecone vectorstore. Finally, you can just chat with your own documents and get insights out of it, similar to ChatGPT for with your own data. Happy Learning.

👉🏼 Links:

🔗 Other videos you might find helpful:

💰🔗 Some links are affiliate links, meaning, when you use those, I might get some benefit.

#openai #llm #datasciencebasics #chatwithdata #documents #chatgpt #nlp
Рекомендации по теме
Комментарии
Автор

Many Thanks for your great work!
It's very well explained and applies to real uses of AI.

gilbertomendes
Автор

So every time I need to chat with my own data I will have to embedding the query? That’s make it much more expensive isn’t?

Alimenteocerebro
Автор

Thank you. How could I print from which document (title) it is from and which page (s). It is useful when multiple files of multiple pages in the sorce directory. Thank you for your time.

motopaediatheview
Автор

I've identified the problem in your code. The issue lies in the creation of chat history. Your code expects a list of tuples, but in your Gradio app, you're creating a list of lists (nested lists), which is causing the code to malfunction.

Please try using the following code instead and replace it in your Gradio block. This updated code should resolve the issue and make it work correctly.

import gradio as gr
with gr.Blocks() as demo:
chatbot = gr.Chatbot()
msg = gr.Textbox()
clear = gr.Button("Clear")

def respond(user_message, chat_history):
print(user_message)
print(chat_history)
if chat_history:
chat_history = [tuple(sublist) for sublist in chat_history]
print(chat_history)


# Get response from QA chain
response = qa({"question": user_message, "chat_history": chat_history})
# Append user message and response to chat history
chat_history.append((user_message, response["answer"]))
print(chat_history)
return "", chat_history

msg.submit(respond, [msg, chatbot], [msg, chatbot], queue=False)
clear.click(lambda: None, None, chatbot, queue=False)

demo.launch(debug=True, share=True)

IamalwaysOK
Автор

great Video. Thanks for your time and explanation.

peralser
Автор

I tried your tutorial, but get stuck on the steps to Pinecone, error: AttributeError: init is no longer a top-level attribute of the pinecone package. Do you have an updated notebook?

francoist
Автор

Its Chunks and not Choonks
just for Fun, Dont take it video is informational and Perfect

tattooGuri
Автор

as always great tutorials! I would love to see this same topic but without using openai..

fabsync
Автор

hello sir,

what will be evaluation metrics we should use for our usecase. kindly let me know

imranmunshi
Автор

Awesome tutorial.
Thank you for sharing!

chineduezeofor
Автор

My question would be. How would you accommodate new random data that has to be introduced to this? Will be do the vectorization process all over again or is there a better way to handle it even for 1 document?

tusharbhatnagar
Автор

Thank you for the good video. I am curious why you stored the vectors in chroma first and again in pincone again? Thank you

ramp
Автор

How can I utilize this ChatBot for my SQL documents?

muratalarcin
Автор

Please do again this video with Streamlit

HeroReact
Автор

Amazing tutorial! Is there a way to add in the sources as well with the responses?

mayank
Автор

Were you able to figure out the error when entering the second query? I’m running into the same issue.

nitroeh
Автор

May I know which website are you using to execute step by step. I learnt a lot form this tutorial

siddhu
Автор

many thanks for great tutorial, but It seems slow, is there any way make it run faster? thanks advance

hoduchoa
Автор

why we split data with chunk of 1000 or 1500 and then get 4 most relevant chunks? why not more than 1500 or 1000 character per chunk? or why not more than 4 releant chunks? is there limitation of characters to feed the chatGPT with data? how much is the limitation? after using the code I checked my API usage in OpenAi and saw that I have used instructGPT. what is instructGPT?

mrmortezajafari
Автор

Getting some numpy error: "AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64' " in all your LangChain related colab notebooks

snehitvaddi