Langchain: QA Evaluation

Показать описание

In this video, I go over a quick example of how you can generate examples for QA evaluation and then evaluate model outputs against the examples.

Relevant Links:

Other links:

Рекомендации по теме

Комментарии

hello, I have this error raised in the last cell but don't know why
KeyError Traceback (most recent call last)
in <cell line: 1>()
4 print("Real Answer: "+example['answer'])
5 print("Predicted Answer: "+predictions[i]['text'])
----> 6 print("Predicted Grade: "+graded_outputs[i]['text'])
7 print()

KeyError: 'text'

sarazayan

So there’s a fan on your ceiling?? Which country are you in?

antarlinamukherjee

Just... wondering... how is the process if i want to give it the examples to compare? Thanks

jennifermosqueracabra

Hi Mark, how can i implement evaluation without a ground truth, example, I have a requirement, a question and an answer, and i want to evaluate the answer based on the requirement and question. for some reason, the documentation for evaluation without ground truth is no longer available

raymondowhondah

Thanks! Can you share the link to the code?

AlonAvramson

Why are the examples you generate considered the ground truth? Aren’t they just generated with an LLM as well? Why are they more correct than chain.apply function? Seems like a human should probably generate ground truth examples for proper testing, no?

Ryan-yjsd

Hey Hi Merk!
I have been trying this code out in my local but donot understand why I am having the below error generated for line chain.run()
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Can you please help me out with this :)

MohammadHaseeb-oc

Langchain: QA Evaluation

Langchain: QA Evaluation

LangSmith Tutorial - LLM Evaluation for Beginners

Evaluating the Output of Your LLM (Large Language Models): Insights from Microsoft & LangChain

Evaluating LLMs using Langchain

Evaluating LLM-based Applications

Langchain: Auto evaluate llms and Agents. OpenAI Gpt 3 evaluation using Langchain and custom prompts

RAG Evaluation (Answer Correctness) | LangSmith Evaluations - Part 12

Reducing Hallucinations in LLMs | Retrieval QA w/ LangChain + Ray + Weights & Biases

Comparing LLMs with LangChain

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Evaluation Approaches for Your LLM (Large Language Model): Insights from Microsoft & LangChain

Evaluate Your Document Retrieval Chains & Agents:Langchain Datasets QAGeneration & Evaluatio...

BEST OPEN Alternative to OPENAI's EMBEDDINGs for Retrieval QA: LangChain

Harrison Chase - Agents Masterclass from LangChain Founder (LLM Bootcamp)

Regression Testing | LangSmith Evaluations - Part 15

RAG Evaluation (Answer Hallucinations) | LangSmith Evaluations - Part 13

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Session 7: RAG Evaluation with RAGAS and How to Improve Retrieval

Evaluate LLMs - RAG

LLM Evaluation Basics: Datasets & Metrics

What is Retrieval-Augmented Generation (RAG)?

Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)

The LangChain Cookbook Part 2 - Beginner Guide To 9 Use Cases

How to Evaluate Your LLM Application with LangSmith: Step-by-Step Guide #ai #coding #machinelearning