Test Driven PROMPT Engineering: Using Promptfoo to COMPARE Prompts, LLMs, and Providers.

preview_player
Показать описание
Wouldn't it be great if you KNEW your prompt was CHEAP, FAST, and ACCURATE? Relying on trial and error isn't enough when you're using prompts in production tools and applications. What you need is a methodical approach to prompt evaluation and testing. 'Test Driven PROMPT Engineering' is your key to unlocking this potential. This video showcases how Promptfoo can be a game-changer in comparing and optimizing your prompts, LLMs, and providers. Gain insights into cost-effective LLM choices, learn about prompt testing essentials to ensure your prompts are as efficient, cheap and accurate as they can be. This tutorial is straightforward and packed with value, designed for prompt engineers, full stack engineers, and product builders who want to make informed, confident, data-driven decisions in prompt engineering.

What might surprise you is how simple prompt testing can be (shout out to the promptfoo developers). Promptfoo will enhance your prompt engineering skills and AI Agents with simple yet customizable LLM testing and evaluation. Promptfoo even has support for testing the new OpenAI Assistants API! It doesn't matter if you're using AutoGen, Assistants API, ollama, ChatDev, Aider, custom agents, multi agent systems or really any other prompt engineering tool. At the end of the day every tool is driven by prompts and that means llm testing and evaluations will help you gain confidence, cut costs, and optimize the results from your prompts.

This tutorial provides a hands-on approach to understanding the intricacies of prompt comparison and optimization. You'll learn token usage, time to completion and how to compare different prompts to choose THE WINNER. Promptfoo enables you to effectively compare and select LLMs, with a focus on achieving the best balance between speed, accuracy, and cost. We discuss key strategies for testing prompts in various scenarios, highlighting the importance of prompt evaluation and testing by looking at real llm test cases using GPT-4 Turbo and GPT-3 Turbo.

Let me know if you're interested in more prompt testing tutorials, frameworks, and methodologies.

📺 Quick Start LLM Testing

🔗 LINKS:

📖 CHAPTERS:
00:00 Are your prompts even good?
00:45 For real apps, prompt FEEL is not enough.
01:14 Cheaper, Faster, Accurate Prompts with PROMPTFOO
01:35 Quick Start LLM Testing
03:20 Immediate Results with GPT-4-Turbo vs GPT-3-Turbo
04:00 Clean and Reusable testing structures
06:00 Asserts and Test Cases
07:45 Learn these 3 components and you're good to go
09:35 LLM Evaluation and Testing 2nd Run
10:00 GPT-4 is breaking the bank and our timeline
12:33 Promptfoo has a lot more to offer for llm testing
13:12 Promptfoo has a OpenAI, Anthropic, Ollama, and soon Gemini Providers
13:30 Three reasons you should test your prompts
14:42 Test Driven Prompts

💬 Hashtags
#gpt #promptengineering #aiagents
Рекомендации по теме
Комментарии
Автор

Thank you! Really good video and I also love that you do not just show their docs, but take the time to create and show actual examples.

timkoehler
Автор

This video is ridiculously insightful. You've explained everything in such a lucid manner, and your video is so well structured - after explaining something it's like you were addressing the next question that popped in my head. Fantastic job!

HerroEverynyan
Автор

I don't even need to know what this video is about to know that I need to watch it.

gigglesmclovin
Автор

Perfect, was looking for exactly this. Thanks for sharing!

s_streichsbier
Автор

Thank you !! Great video and content, clearly gonna try and implement in my processes. If you like it keep up the good work

vazquezsebastian
Автор

love the insights related to gpt3 vs 4 comparison, and the message around how testing saves time!

judymou
Автор

You said that in case of recognizing NLQ gpt3.5-turbo is 10 x faster and 4 x cheaper than gpt4. It is actually 40 times cheaper as gpt 3.5 is 10 times cheaper per 1000k tokens

macoson
Автор

Wait. I can only test prompts against barebone models? How would I test an agent? Something, that can be executed and returns a response?

cutmasta-kun
Автор

Thanks for the lesson Andy, though after the ttydb project i was sure this will showcase how to automate prompt optimization ontop of promptfoo. Like ttydb, i believe we can make an agenticFoo that constantly and consistently improves prompts through out any project.
What do you think?
Meanwhile all the best ❤

fire
Автор

Hi @IndydevDan, I have been enjoying your autogen tutorials and experimenting. But suddenly I came to know about Langchain. Call me novice but what's the difference between these two?

SynonAnon-viql