Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

preview_player
Показать описание


Follow me on social media for more tips & fun:

Disclaimer: This description may contain affiliate links. Cryptocurrencies are not investments and are subject to market volatility.
Рекомендации по теме
Комментарии
Автор

This is rare. TechLead is actually uploading a useful coding tutorial instead of his opinions.

davidl.e
Автор

This is the way to explain LangChain in the style of TechLead. You nail it. Hopefully more this stuff in the future. Thanks.

jayhu
Автор

This is exactly what I desired to do with my own data, but I haven’t spent any time yet to research and figure out a way to do it. I am glad there is a public way to do it.

TheRealTommyR
Автор

hours and hours of chatgpt courses... i learned more by watching 5 minutes of your video. congratulations for the clarity and the practical approach👍

e-matesecom
Автор

Awesome seeing TechLead do programming, the Maestro at work.

charleswhite
Автор

This one video alone saves so much time. Instead of watching hours of some of the playlists out there. It's better to start here and then go straight to the Langchain docs to work out other use cases. Excellent TechLead.

jsnmad
Автор

By far one of the best ChatGPT video tutorials I've seen on YouTube. Great work

adasi
Автор

Glad to know I'm not the only one doing this.

As a student I've been feeding ChatGPT all my previous course work, its able to answer essay prompts and other homework related tasks in my writing style and or in simmilar formats as if I was the one writing it. I'm able to save alot of time by doing this

john.
Автор

So after digging into the code, I found that Langchain is actually doing the following things: 1. for all your data, store then in vector storage using embeddings; 2. when you query something, it first did a similarity search in the embeddings database, and find out the files that's related to your question; 3. After finding the related files, it takes all the text of that file, together with a context message: "Use the following pieces of context as the 1st system message to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an {your text data}".

This somehow tells us these points:
1. why it's sometimes not having outside world's information? If the question you asked is not in your document, or if it's not trained on the data for your question, it will return nothing valuable as instructed.

2. Is there a limit on the sizes of your data? Yes, you can't use it with super large files because it's doing a document filtering and it will send all text related to the API server, recently the gpt-3.5-turbo-16k might be the good model to use and it's best the total size of related docs is less than 16k tokens. Which means the best practice would be grouping your data into different topics and try to ensure any query, if responded with similarity search, the total size of returned document is not exceeding the token size limit of the model. I think16k is roughly the size of a 13-15 pages paper.


3. By removing/changing the system message, you might get better results for common sense questions. I really don't like the system messaged by default, since in a playground, asking gpt-3.5-turbo-16k "Who is George Washington?" will give you better answers comparing to the langchain solution with an empty system message.

4. The langchain is using unstructured library (it reports errors when I didn't install it), which means you can not only use txt files, but also pdf files, word files, etc. Haven't tested it out but highly likely support query of multiple pdf files using similar code in the video. So you can put multiple pdfs in a folder, using a directory index creator and ask questions for your papers, I think (haven't tested it out)

5. The langchain not only supports ChatGPT models, but also other models in the chat_models package. Google PALM2 chat is also supported as of Jul 10, 2023, if someone has the key, you can use other models too. While I don't think PALM2 has the common sense knowledge as good as ChatGPT, but I think it is a better language generating model comparing to at least gpt-3.5-turbo-16k, so PALM2 may produce better results on your data and OpenAI's models are better in answering common sense questions after changing the default system message. OpenAI said general access to gpt-4 is starting, and people with history of successful payment using OpenAI API will get the access immediately a few days ago. The access to new developers will be rolled out until end of July.

Also I think it's quite cool to be able to use your own data, if you want to create something like an AI assistant, you can always use code to collect current time, user information and put those in a folder, so the assistant will be able to do much more than current ones.
Another very cool thing is auto-gpt which works great using gpt-4, gpt-3.5 is not smart enough and behaves much worse than gpt-4. If you asked auto-gpt something, it will be able to google itself and replied with the real time information. Also the example of auto-gpt is cool telling you how it could create a recipe based on the next holiday. Hopefully the access to gpt-4 is coming sooner.

RunningBugs
Автор

May be 8 months late and Langchain has been updated since, but this is one of the best videos I watched. Thank you.

hichamalaoui
Автор

First great video.
Second I just had to comment on the "one language" you mentioned programmers claiming that's all they wanted to know.
Last count i have coded in over 15 languages since i wrote my first line of code back in 1985.
We have not deployed anything using LangChain yet (we have only been using LlamaIndex) but for the same reason that i know so many languages, we will be using LangChain soon to see what it can do.
As for plugin, i will always be for building your own so you have full control and can do things that the plugin "left out." Things like ability to use your own data (and keep it on your servers).

We have found that if you are deploying a Help feature for your application you do not want to allow the code to get information from "the outside world."

larryczerwonka
Автор

Great video. I'm a junior data scientist in Belgium and it's actually helping me for one of my projects. You're totally right when you say that everyone should learn Python. I only learned C and C# during my studies but now that I've learned python I'm using it almost everyday.

andygilet
Автор

Im studying law at the moment and I’m seriously scared about how this will change the legal industry. Honestly could see it replace 90% of lawyering.

MacroAnarchy
Автор

Great vid, especially the in end with MS’s case study of customer reviews for cars. For those, who actually struggling to find real world applications for the new AI stuff. Thank you!

fenchelteefee
Автор

Wow, this was awesome. All this information in one place. Also, I appreciate your fast dialog and sticking to the important points. I subscribed and will recommend this site to others.

andrespineda
Автор

Semantra is a pretty cool tool to analyze your documents and be able to search them with natural language. It's probably more research-oriented since it links you to the different pages and snippets that match your query.

jcollins
Автор

Thanks TechLead, it's nice to see this type of videos !

seize
Автор

Eye opener! I am a tech student, and was researching whether we could make a custom GPT of our own. This was on point! Thanks @techlead!

riyaski
Автор

You made my day. I've been struggling with fine tuning a GPT 3 model with mediocre success and an enormous data collection and preparation effort. It would never even get close to the results achieved with langchain within 1 minute of coding and 9 minutes of data preparation.

danield.
Автор

Loved this. I am a Sales guy with zero coding exp. I listen to content like yours to glean some nuggets to better understand the impacts and have meaningful conversations with my customers. Truly helpful content.

sr