Q: How put 1000 PDFs into my LLM?

preview_player
Показать описание
A subscriber asks: HOW to put 1000s of PDFs into a Large Language Model (LLM) and perform Q&A? How to integrate huge amount of corporate and private data into an open source LLM?

#aieducation
#chatgpt
Рекомендации по теме
Комментарии
Автор

Wow ! Ask a question and get a whole great video explanation ! Could not ask for more. Was not even sure my question was going to get answered because it was asked sometime after the comments in that video seemed to have stopped. I was thinking of asking the creators on their site, but I am sure the best explanation I could get is the above video. I now finally have a much, much better understanding of what it going on with this type of set-up. I was really in the dark and a little frustrated trying to understand just how this was working and how it related to fine-tuning and other such things I am learning about, but your explanation and graphics made it very clear and easy to understand for someone like myself who is not expert in this area. I can now use this type of set-up much more intelligently now that I know how it is working. Thank you very much ! It probably would have taken a lot of research and reading before I ever came close to figuring this out on my own. I have been learning a lot from your videos. They are very informative and are all both presented very well and explained very well. Look forward to future videos.

matthewmcc
Автор

I always worry about prior knowledge before getting into a new topic as big as creating custom LLMs, however, this channel makes it feel like the best place to start. Thank you and I hope this channel grows exponentially.

zakariaabderrahmanesadelao
Автор

Great info. Thanks for breaking it down in a way that was easy to understand.

lance
Автор

I've been trying to learn about this, but high-quality LLM content like this video is hard to come by - ty!

kurtiswithakayy
Автор

thank you for this video - enlightening!

mowsiek
Автор

It's like you're reading my mind on what do I need to learn next

svb
Автор

A few months ago I had tried some open source vector databases with llama 1 and I couldn’t for the life of me get the model to predictably gather data from the vector database. What could’ve caused this? And do the newer models tend to gather information better? Or the newer vector databases? Or could I just not have been prompting the models right?

Nick_With_A_Stick
Автор

Thanks for exaplaining RAG approach in very simple manner!

Dattatray.S.Shinde
Автор

so for the math formulas with latex in the pdfs or if the pdf is scanned and have to be scanned by ocr what?

darkmatter
Автор

Nice video! What have you discovered to be the best type of file for encoding? Since you don't prefer PDFS? Is it JSON?

andreyseas
Автор

Hi dear friend .
Thank you for your efforts .
How to use this tutorial in PDFs at other language (for example Persian )
What will the subject ?
I made many efforts and tested different models, but the results in asking questions about pdfs are not good and accurate!

Thank you for the explanation

mohsenghafari
Автор

Hey
Can i somehow improve gpt 3.5 ability to write code. If yes, what will be the cheapest method

ahsaniftikhar
Автор

Would it ever be worthwhile to label the sentences first to identify multi-word ‘spans’ with special meaning or to capture entities that have specific numerical values attached to them?

wdonno
Автор

Is there a point where you think LoRA should be done and not only Embedding/RAG as explained in this video?
The background for this is let's say you have 1 million documents, 10k of them are relevant for your search query but you won't be able to fit all of them into the prompt. Thus fine tuning your model would make sense?

And also: Can you recommend a channel that will implement this with an example?

Elektronc
Автор

Hi code_your_own_AI, what if my task is on trying to summarize the 1000 pdf vs trying to search? It seems like there are many search capabilities out there, but I just want to know what all of them is about, which is also a common task, but I haven't found many summarization methods that do it easily. There is no good benchmark for evaluating summarization, it almost all boils down to human evaluation, which is not scalable. One simple way that gets 80% of the job is with sentence embedding and then hierarchical clustering, but as you can see the pdfs may contain multiple topics and need to be in different clusters. We can use LLM to summarize each pdf, or extract main topics, but then we need a way to benchmark and make sure the summary is good. Love to hear your comment or if you can point me to right sources! Thanks!

Azariven
Автор

I want to design a ATS system so that I can filter the resumes based on job description.
Suppose there are 10000 resume of candidates and I want to get Top 50 or 100 resume that can be best suited for job description.
input:- 10000 resumes in pdf format.
Output: Top 50/100 resume that best suited for job description.
How can I achieve this using LLM and Streamlit for UI?

ashoksamrat
Автор

Boggles me how everyone hasn’t done the ram bypass yet. Maybe I should sell it as a function/button. call it the creativity tax

brettmiddleton
Автор

0:21 I don't need a computer from them, I'm a humble person 😇, just let me download the gpt-4 model, I'll run it myself 🤣I'll not let anyone copy it from me, I promise 😇

gileneusz