How to evaluate an LLM-powered RAG application automatically.

preview_player
Показать описание
Source code of this example:

I teach a live, interactive program that'll help you build production-ready machine learning systems from the ground up. Check it out here:

To keep up with the content I create:
Рекомендации по теме
Комментарии
Автор

Hey Santiago! Just wanted to drop a comment to say that you're absolutely killing it as an instructor. Your way of breaking down the code and the whole process into simple, understandable language is pure gold, making it accessible for new comers like me. Wishing you all the success and hoping you keep blessing the community with your valuable content!
Aside of the teaching side have you tried to create micro-saas based on this technologies ? For me seems that you are half there and could be great opportunity to expand your business.

aleksandarboshevski
Автор

THANK YOU
I greatly appreciate the release of the new videos. The clarity of the explanations and the logical sequence of the content are exceptional.

TooyAshy-
Автор

This is my first time watching your videos. It is great. Thank you.

mohammedsuliman
Автор

Thanks ! Exactly what I was looking for. I’ve been cracking my head on how the hell to test a RAG system. How the hell is business going to give me 1000+ questions to test and how can a human verify the response. Top content.

AmbrishYadav
Автор

Oh man, the way you explained these complex topics is mind blowing. I just wanted to say thank you for making such types of videos.

dikshantgupta
Автор

FYI, keep an eye on the mic volume levels! Sounds like it was clipping

TheScott
Автор

Hello Santiago, Your explanation was thorough and I understood it really well, Now I have a question as is there any other tool than giskard to evaluate (which is open source and does not require openai api key) for my llm or rag model.
Thank you in advance😊

hhdofno
Автор

Love the video. Great breakdown. Would like to see more detail in evaluation results (e.g. it is now .73 good. WTH...!?), how tweaking the pipeline gives different eval results, and e.g. Ragas versus Giskard.

peterhjvaneijk
Автор

We appreciate your work a lot, my man.

TPH
Автор

Glad to see you involved pytest in the end, it is like a surprise dessert🍰 after great meal.

liuyan
Автор

Superb video. Great content from start to finish. Thank you.

tee_iam
Автор

Damm, you explained each step really well! Love it!

maxnietzsche
Автор

Great stuff!

What are your preferred open source alternatives to all tools used in this tutorial?

horyekhunley
Автор

Super important topic you covered here man!

alextiger
Автор

Great stuff Santiago! You've used giskard to create the test cases. These test cases themselves are created using an LLM. In a real application, would we have to manually vet the test cases to ensure they themselves are 100% accurate?

CliveFernandesNZ
Автор

Just awesome instruction Santiago. I am a beginner but you make learning digestible and clear! Sorry if ignorant question. but is it possible to use FAISS, Postgres, MongoDB, or Chroma DB, or another free open source model that can be substituted for pinecone to save money, and if so which would you recommend for ease of implementation with Langchain?

theacesystem
Автор

Amazing! Can you also explain how to do the same type of evaluation on Vision Language Models that use images?

MohammadEskandari-doxy
Автор

How can i use huggingface llms to generate the testset?

francescofisica
Автор

So the one thing you learn training ML models is that you don’t evaluate your model on training data, and be careful of data leaking. Here, you’re providing giskard your embedded documentation, meaning giskard is likely using its own RAG system to generate tests cases, which you then use to evaluate your own RAG system. Can you please explain how this isn’t nonsense? Do you evaluate the accuracy of the giskard test cases beyond the superficial “looks good to me” method that you claim to be replacing? What metrics do you evaluate giskard’s test cases against since its answers are also subjective, you’re just now entrusting that subjective evaluation to another LLM?

maxisqt
Автор

do we need to have a paid subscription to openai apis to be able to use giskard?

dhrroovv