Evaluate LLMs with Language Model Evaluation Harness

preview_player
Показать описание
In this tutorial, I delve into the intricacies of evaluating large language models (LLMs) using the versatile Evaluation Harness tool. Explore how to rigorously test LLMs across diverse datasets and benchmarks, including HellaSWAG, TruthfulQA, Winogrande, and more. This video features the LLaMA 3 model by Meta AI and demonstrates step-by-step how to conduct evaluations directly in a Colab notebook, offering practical insights into AI model assessment.

Don't forget to like, comment, and subscribe for more insights into the world of AI!

Join this channel to get access to perks:

To further support the channel, you can contribute via the following methods:

Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
#openai #llm #ai
Рекомендации по теме
Комментарии
Автор

What are the default LLM generation setting when we run lm eval harness. and How can we modify the generation settings. Also what are the deciding factors for generation setting on specific benchmark.

alishafique
Автор

damn??? I dont know why i can't use my access token. I create a read key then i paste it into "notebook_login()" but it didn't show any thing. Yeah, nothing. So I can't know whether my key is valid or not

nguyennguyen-wqcb
Автор

PackageNotFoundError: No package metadata was found for bitsandbytes. I am getting this error even though bitsandbytes is installed and my cuda version is 12.1, please help me with this

krishnapriya
Автор

if i want to evaluate the LLM using custom data set is that possible using the GIT repo that you have provided here?

vishnubhatlaprasanth
Автор

I want to evaluate my custom model on mmlu. How can I do this?

sinchanabhat
Автор

Can we do evaluation on azure openai model?

farahamirah
Автор

What about langsmith?It does the same thing right?

saumyajaiswal
Автор

I need rag chatbot part 2 video, please release, my exam is coming

araara