Building and Testing Reliable Agents

preview_player
Показать описание
This talk was given as a workshop at the AI Engineering World's Fair on June, 24 2024. LLM-powered agents hold tremendous promise for autonomously performing tasks, but reliability is often a barrier for deployment and productionisation. Here, we'll show how to design and build reliable agents using LangGraph. We’ll cover ways to test agents using LangSmith, examining both agent's final response as well as agent tool use trajectory. We'll compare a custom LangGraph agent to a ReAct agent for RAG to showcase the reliability benefit associated with building custom agents using LangGraph.

Slides:

CoLab:

Notebook:

LangGraph:
Рекомендации по теме
Комментарии
Автор

you are always doing great, huge fan of your videos❤ keep doing that

darkmatter
Автор

Hi Lance just wanted to drop a thank you from me and my team for always being on top of the RAG game. This is a complex field with fast evolving concepts and LangGraph seems the tool we have been looking for.
What is your take on graphRAG's: are they production ready, will they eventually replace or complement current RAG systemes ?

awakenwithoutcoffee
Автор

Great video Lance. Thanks for sharing it!

Looking at the evaluation results, it seems that the custom agent always performs a web search before generating the answer. Does it mean that the grader agent always scores 0 the output received by the RAG agents? That would be interesting, because the react agent sometimes skips the web search (meaning that score = 1).

sergiozavota
Автор

Trying to learn more about these types of processes. Am I correct to understand that the agent for loop would also need to make more LLM calls (thus being more expensive) as it needs to make an extra to decide which step to take next? While with the mixed method you only make that extra call when he is grading.

adrenaline
Автор

I keep playing up with fancy agent packages so called to one tool for everything, however when i try it to do actual simple work i see that it doesnt even correctly tell me the current time without several trial and errors etc.. reliability and consistency is very important if we want to implement such tools in real business. No tolerance to error s.thanks

stanTrX
Автор

What version of langchain is being used on this video ?

mospher
Автор

I would be very helpful if LangGraph had built-in code interpreter support: LLM prompted to generate code instead of calling predefined functions (tools), the framework executes the code and return the results back to LLM.
Both OpenAPI Assistants API and AutoGen have this.

pphodaie
Автор

Tools have docstrings as part of prompt. How do you manage these docstrings?

xuantungnguyen
Автор

Lance,
I really enjoy your videos. One of thing that I have notice for all demos, not just yours, compound request/questions are not used.

In the examples below, to elicit the desired outcome, a decomposition of the multiple sentences must take place. A Chain of Thought or reasoning process is necessary to address the compound request. I do not see how using Langgraph would be suitable in the initial step.


For example:
I am looking for information on Garlic. I want to understand the health benefits as well as studies that have been conducted. Provide the list of resources used in your research.

Generate a report on Katherine Johnson and John Hopkins. review report and address short falls. Compare the background of that of John F Kenney.

Are there known latency issue with Milvus? If there are, what are the work arounds.

mrchongnoi
Автор

What is the conference that you gave this presentation at?

codekiln
Автор

which software did you use to record the video, thanks.

datauv-asia