Prompt Optimization Using Datasets and Experiments

Показать описание

Arize Datasets and Experiments is a new workflow for AI LLM test prompts and prompt iteration that can help AI engineers get reliable, performant, accurate LLM outputs. Instead of manually reviewing test cases every time you update the prompt, Arize can do the work instead and help you scale up across thousands of test cases in a CI/CD pipeline. With Datasets and Experiments, you can curate a dataset of key points that you’re trying to test, run your LLM task against those key points, use code or LLMs or user-generated annotations to evaluate the output, and get aggregate scores across many test runs. This allows you to test as you build and verify experiments before you deploy to customers, something folks looking for a prompt optimizer might want to check out.

🔗 Handy Links 🔗