Langchain: QA Evaluation

preview_player
Показать описание
In this video, I go over a quick example of how you can generate examples for QA evaluation and then evaluate model outputs against the examples.

Relevant Links:

Other links:
Рекомендации по теме
Комментарии
Автор

hello, I have this error raised in the last cell but don't know why
KeyError Traceback (most recent call last)
in <cell line: 1>()
4 print("Real Answer: "+example['answer'])
5 print("Predicted Answer: "+predictions[i]['text'])
----> 6 print("Predicted Grade: "+graded_outputs[i]['text'])
7 print()

KeyError: 'text'

sarazayan
Автор

So there’s a fan on your ceiling?? Which country are you in?

antarlinamukherjee
Автор

Just... wondering... how is the process if i want to give it the examples to compare? Thanks

jennifermosqueracabra
Автор

Hi Mark, how can i implement evaluation without a ground truth, example, I have a requirement, a question and an answer, and i want to evaluate the answer based on the requirement and question. for some reason, the documentation for evaluation without ground truth is no longer available

raymondowhondah
Автор

Thanks! Can you share the link to the code?

AlonAvramson
Автор

Why are the examples you generate considered the ground truth? Aren’t they just generated with an LLM as well? Why are they more correct than chain.apply function? Seems like a human should probably generate ground truth examples for proper testing, no?

Ryan-yjsd
Автор

Hey Hi Merk!
I have been trying this code out in my local but donot understand why I am having the below error generated for line chain.run()
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Can you please help me out with this :)

MohammadHaseeb-oc