Generative AI / LLM - Document Retrieval and Question Answering

preview_player
Показать описание
With Large Language Models (LLMs), we can integrate domain-specific data to answer questions. This is especially useful for data unavailable to the model during its initial training, like a company's internal documentation or knowledge base.

This article helps you understand how to implement this architecture using LLMs and a Vector Database. We can significantly decrease the hallucinations that are commonly associated with LLMs.

Check out my Generative AI Series:

Generative AI — Getting Started with PaLM 2

Generative AI — The Evolution of Machine Learning Engineering

Generative AI — Best Practices for LLM Prompt Engineering

Generative AI — Document Retrieval and Question Answering with LLMs

Generative AI — Mastering the Language Model Parameters for Better Outputs

Generative AI - Understand and Mitigate Hallucinations in LLMs

If you enjoyed this video, please subscribe to the channel ❤️

🎉 Subscribe for Article and Video Updates!

You can find me here:

If you or your company is looking for advice on the cloud or ML, check out the company I work for.
We offer consulting, workshops, and training at zero cost. Imagine an extension for your team without additional costs.

#vertexai #googlecloud #machinelearning #mlengineer #doit

▬ My current recording equipment ▬▬▬▬▬▬▬▬

Support my channel if you buy with those links on Amazon

▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
00:27 Architecture
01:16 Languages
01:45 Indexing vs. Tuning
03:33 Langchain
04:19 Data
04:40 Get data
05:10 LangChain chunking
06:02 Embedding and Vectordatabase
11:06 LLM answers questions
13:00 Bye
Рекомендации по теме
Комментарии
Автор

Excellent walk-through, I'll have to give it a try. Thank you very much.

kenchang
Автор

I'm a junior engineer intern for a startup called Radical AI and I am doing exactly this process right now lol. You do a better job of explaining everything than my seniors.

christopheryoungbeck
Автор

Great video. what modifications should be done to run queries on public index (not under VPC)?

jcypwwo
Автор

Hey Sasha, thanks a lot for making this! It would be great to learn if there's an easy way to embed documents in Firebase. Would be extremely useful to have a workflow where embeddings are generated for each document when it's changed (e.g. a user updates content on an app) so that the query is always matched against realtime data sources.

I was also wondering if there's a way to do a semantic search query combined with regular filtering on metadata (e.g. product prices, size, etc). Would love to see a follow up tutorial on this in the future :)

itsdavidalonso
Автор

Great Video !! Does the LLM gets trained here ? This is a major doubt here. Or is it just used as an engine for answering based on the embeddings and similarity ?

shivayavashilaxmipriyas
Автор

Hi! Thanks for making this walkthrough, it was super helpful as a beginner. I was able to follow all the steps you detailed, however, when I try running the final product it produces the same context every time - regardless of the question I prompt. Do you have any idea why that might be? Thanks in advance!

campbellslund
Автор

Hi! Great video!
Is there a way that we can limit the model to respond only to the data that we gave him? Thank you!

vasilmutafchiev
Автор

Hey,

Thanks for the great content! I had 2 questions:

With this setup, what needs to happen if you want to add new data to the vectore store?
First we chunk the new document and create new embeddings and upload to the GS bucket, is that all or does something need to happen with the Matching Engine / Index?

Other question, do you know if the LangChain Javascript library has any limitations in this use case?

TarsiJMaria
Автор

7:38 for the embedding, is the data sent off the computer? It seems like it if you are using retries. If so, is there any way to completely contain this process so that no data leaves the machine? This would be relevant to at least the embedding, vector DB, and LLM prompts. Thank you

d_b_
Автор

Just noticed this channel. Great content, with code walkthroughs. Appreciate your effort!!

Have got a question @ml-engineer :
Is it possible to Question-Answer separate documents with only one index for the bucket? While retrieval or questioning from vector search, I want to specify which document/datapoint_id I want to query from.

Currently when I add data points for multiple documents to same index, the retrieval response for a query match is based on globally from all the documents, instead of the required one.


P.s. : I am using MatchingEngine utility maintained by Google.

dutpxio
Автор

Hey, sir I want to know if I have any company documents locally so how can we use it ? And load data and one more thing does it provides answer exactly mentioned in pdf or documents or perform any type of text generation on output?

Tech_Inside.
Автор

At 3:38 you say 'You need a Google project", but I'm not sure what that is exactly. Do I need a GCP account and then create a VM for that?

russelljazzbeck
Автор

Hey Sascha,

I've been playing around a lot more, but i've run into accuracy issues that i wanted to solve using MMR (max marginal relevance) search.
It looks like teh Vertex ai Vector store (in Langchain) doesn't support this, at least not the NodeJs version but if i'm not mistaken it's the same in python.

Do you know what the best approach would be?
As a workaround i'm overriding the default similarity search and filtering the results before passing it as context

TarsiJMaria
Автор

I have content of nearly 100 pages. Each page have nearly 4000 characters. What chunk size I can choose and what retrieval method I can you for optimised answers?

rinkugangishetty
Автор

Hello! thank you so much for the video. I have a problem at the last code cell:
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for :10000: unparseable host:port"
debug_error_string = "UNKNOWN:DNS resolution failed for :10000: unparseable host:port

cinhlor
Автор

Thanks for the video, how do we evaluate the model?

saisandeepkantareddy
Автор

Can this be deployed to a Vertex AI endpoint as a custom container?

wongkenny
Автор

Nice, does the data is in xml format ?

HailayKidu
Автор

What if I want to batch process different sites ? What would be your approach ?

louis-philippekyer
Автор

Thank you for the explanation, really liked it !!

I was wondering if we use DPR (Dense Passage Retrieverl) on our own data and want to evaluate its performance like precision, recall and F1 score, if we have a small reference data which can serve as ground truth. Can we do that ? I am confused on the fact that since DPR is trained only on wiki data as far as i know, will it be nice to measure the efficiency of the DPR retrieval, when i follow this RAG approach ?

bivasbisht