Building and Testing Reliable Agents

Показать описание

This talk was given as a workshop at the AI Engineering World's Fair on June, 24 2024. LLM-powered agents hold tremendous promise for autonomously performing tasks, but reliability is often a barrier for deployment and productionisation. Here, we'll show how to design and build reliable agents using LangGraph. We’ll cover ways to test agents using LangSmith, examining both agent's final response as well as agent tool use trajectory. We'll compare a custom LangGraph agent to a ReAct agent for RAG to showcase the reliability benefit associated with building custom agents using LangGraph.

Slides:

CoLab:

Notebook:

LangGraph:

LangChain

Рекомендации по теме

Комментарии

you are always doing great, huge fan of your videos❤ keep doing that

darkmatter

Hi Lance just wanted to drop a thank you from me and my team for always being on top of the RAG game. This is a complex field with fast evolving concepts and LangGraph seems the tool we have been looking for.
What is your take on graphRAG's: are they production ready, will they eventually replace or complement current RAG systemes ?

awakenwithoutcoffee

Great video Lance. Thanks for sharing it!

Looking at the evaluation results, it seems that the custom agent always performs a web search before generating the answer. Does it mean that the grader agent always scores 0 the output received by the RAG agents? That would be interesting, because the react agent sometimes skips the web search (meaning that score = 1).

sergiozavota

Trying to learn more about these types of processes. Am I correct to understand that the agent for loop would also need to make more LLM calls (thus being more expensive) as it needs to make an extra to decide which step to take next? While with the mixed method you only make that extra call when he is grading.

adrenaline

I keep playing up with fancy agent packages so called to one tool for everything, however when i try it to do actual simple work i see that it doesnt even correctly tell me the current time without several trial and errors etc.. reliability and consistency is very important if we want to implement such tools in real business. No tolerance to error s.thanks

stanTrX

What version of langchain is being used on this video ?

mospher

I would be very helpful if LangGraph had built-in code interpreter support: LLM prompted to generate code instead of calling predefined functions (tools), the framework executes the code and return the results back to LLM.
Both OpenAPI Assistants API and AutoGen have this.

pphodaie

Tools have docstrings as part of prompt. How do you manage these docstrings?

xuantungnguyen

Lance,
I really enjoy your videos. One of thing that I have notice for all demos, not just yours, compound request/questions are not used.

In the examples below, to elicit the desired outcome, a decomposition of the multiple sentences must take place. A Chain of Thought or reasoning process is necessary to address the compound request. I do not see how using Langgraph would be suitable in the initial step.

For example:
I am looking for information on Garlic. I want to understand the health benefits as well as studies that have been conducted. Provide the list of resources used in your research.

Generate a report on Katherine Johnson and John Hopkins. review report and address short falls. Compare the background of that of John F Kenney.

Are there known latency issue with Milvus? If there are, what are the work arounds.

mrchongnoi

What is the conference that you gave this presentation at?

codekiln

which software did you use to record the video, thanks.

datauv-asia

Building and Testing Reliable Agents

Building and Testing Reliable Agents

Architecting and Testing Controllable Agents: Lance Martin

Building Agents? You Need To Have A Testing Suite!

Build Your First Autonomous Agent with Copilot Studio

Breaking Down & Testing FIVE LLM Agent Architectures - (Reflexion, LATs, P&E, ReWOO, LLMComp...

Testing The Claude 3.5 x OpenAI-o1 MCP AI AGENT - Something Special?

Test and Tuning of Reinforcement Learning Agent in Q-Learning Project

Build Anything with Claude Agents, Here’s How

How To Run Automated Testing For AI Voice Agents

Construction Trade Testing Agent - Plumber Skilled Trade Test

How to Build a $2,500 AI Calling Agent (In 2025)

Construction Trade Testing Agent - Concreter Skilled Trade Test

Build Your Own Onchain AI Agent! 🤖 (Typescript, OpenAI, Viem)

Building Standards (Material Testing & Analysis) Civil Engineering and consultancy agency

AGENTS - Ten Minute Testing

Build a $15K AI RAG Agent for Just $5! (BEGINNER FRIENDLY)

AutoGPT Test and My AI Agents Effortless Programming - INSANE Progress!

Viscosity test for latex bonding agent

Aircrete Column | Foam Agent Test Part 2 #Aircrete #AircreteHarry #AircreteColumn #FoamagentTest

Build Your OWN RAG AI Voice Agent with n8n | Complete Step-by-Step Tutorial

TripAdvisor: Building a Testing Framework for Integrating Open Policy Agent into Kubernetes

Building with the Gemini API and AI Studio

Construction Trade Testing Agent - Carpenter (Building Formwork) Skilled Trade Test

How to Battle Test Your Agents With OpenAI’s Evaluation Feature