Create Multimodal Multi-Agent Apps with Autogen Studio | LLM Text to Speech Tutorial

Показать описание

In this video, we will be looking at the newly released Autogen Studio, a UI interface for building AI agents that can collaborate to solve complex tasks. The concept concerns getting specialized agents with assigned skills to converse while solving tasks. For example, one agent would be responsible for the planning reasoning, and another would be responsible for the execution. Research from the Autogen team, who are part of the Microsoft Research team, suggests that Large language models perform much better when there is a feedback loop. We already see similar concepts with Langchain's Agents and also, to an extent, ChatGPT's Code interpreters.

In this video, we look at one of their examples for plotting graphs for stock prices for a specific year. We also show how we can get it to summarize research papers in audio format using multiple tools.

📚 Resources

👤 About Me - Ugo Osuji:

Рекомендации по теме

Комментарии

The end of a task is often accompanied by the agents all congratulating one another in an endless loop of 'thank you's' and praise. This can cost a lot of money over time... So make sure that you nip this activity at the bud when producing a 'SYSTEM MESSAGE' for each agent! Otherwise you could be paying for GPT-4 thanking itself 50 times in a row! Also those system messages can cost money on their own so read over them a few times and decide if some bits are really needed!

mickelodiansurname

Amazing tutorial, please give more, learnt a lot here.

rorydaines

Thanks for the video! Can you do a video showing us how to create these agent skills?

johnbarros

How to set api base in the environment? It doesn't work when I setup the API key and base in the agents.

AngusLou

Create Multimodal Multi-Agent Apps with Autogen Studio | LLM Text to Speech Tutorial

Create Multimodal Multi-Agent Apps with Autogen Studio | LLM Text to Speech Tutorial

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

Multimodel Multimodal and Multiagent innovation with Azure AI | BRK104

Huggingface Agents: Multimodal Transformers Agents Are Here & Its Open Source

Building a Multimodal RAG App for Medical Applications

Agent-OS : This AI Agent can CONTROL YOUR COMPUTER & DO ANYTHING (Generate Apps, Code, RAG, etc....

Build Anything with AI Agents, Here's How

Build an AI Voice Assistant App using Multimodal LLM 'Llava' and Whisper

Building a Multimodal AI Agent From Scratch!

[CVPR2023 Tutorial Talk] Multimodal Agents: Chaining Multimodal Experts with LLMs

Multi AI-Agents Reasoning LLM - CODE Examples (Python)

How I Trained AI Agents to Automate my Work? (CrewAI)

Gemini Multimodal RAG Applications with LangChain

What is Retrieval-Augmented Generation (RAG)?

Generative AI-powered multimodal agents for research report generation | Amazon Web Services

ClaudeDev (Upgraded) : The BEST Coding Agent just got Opensource LLM & Multimodal Support + Cach...

Building Production-Ready RAG Applications: Jerry Liu

Multimodal RAG with GPT-4-Vision and LangChain | Retrieval with Images, Tables and Text

Segment Anything Model 2 +Building XR Applications +Code-Savvy Assistants +Multi-Modal RAG Embedding

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

Multimodal models and AI agents will transform software | State of the Cloud 2024

SUGILITE: Creating Multimodal Smartphone Automation by Demonstration

What is LangChain?