filmov
tv
Regression Testing | LangSmith Evaluations - Part 15
![preview_player](https://i.ytimg.com/vi/xTMngs6JWNM/maxresdefault.jpg)
Показать описание
Evaluations can accelerate LLM app development, but it can be challenging to get started. We've kicked off a new video series focused on evaluations in LangSmith.
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video focuses on Regression Testing, which lets a user highlight particular examples in an eval set that show improvement or regression across a set of experiments.
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video focuses on Regression Testing, which lets a user highlight particular examples in an eval set that show improvement or regression across a set of experiments.
Regression Testing | LangSmith Evaluations - Part 15
Why Evals Matter | LangSmith Evaluations - Part 1
Repetitions | LangSmith Evaluation - Part 23
LangSmith in 10 Minutes
Online Evaluation (Guardrails) | LangSmith Evaluations - Part 21
LangSmith For Beginners | Must know LLM Evaluation Platform 🔥
Agent Response | LangSmith Evaluation - Part 24
Pairwise Evaluation | LangSmith Evaluations - Part 17
Dataset Splits | LangSmith Evaluation - Part 22
How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18
Using LangSmith in a non-LangChain codebase
LLM Benchmarks for Evaluation
LangFuzz: Redteaming for Language Models
LangSmith: In-Depth Platform Overview
LangSmith in Depth: Part 11.
LLMs & AI Benchmarks! - GenAI Eval Deep Dive
Instrumenting & Evaluating LLMs
Making LLMs (Large Language Models) More Predictable: Expert Insights from Microsoft & LangChain
Retrieval Augmented Generation with LangChain: ChatGPT for Your Data (PT 2)
LlamaIndex Workshop: Evaluation-Driven Development (EDD)
MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics (EMNLP 2020)
AI Engineering 201: The Rest of the Owl
Chatbot Testing Made Easy: Step-by-Step Guide for Beginners | software testing | AxelBuzz Testing
Mastering HuggingFace Model Evaluation: In-Detail Walkthrough of Measurement, Metric & Comparato...
Комментарии