Why I'm Staying Away from Crew AI: My Honest Opinion

preview_player
Показать описание
Crew AI is not suitable for production use cases. I’ll be going through why I believe this is the case and what you should do instead when building your own apps.

Chapters
Introduction: 00:00
Inro to multi-hop questions: 01:31
Schematic of the agent workflow: 06:42
Crew AI Python Code: 10:29
Testing the Crew AI Workflow: 24:48
What’s next for Multi-Agent frameworks: 47:50
Рекомендации по теме
Комментарии
Автор

This is one of the best videos I have ever seen related to AI. Let me list some points:
1- Do not ever expect to have acceptable costs as long as you are depending on ClosedAI
2- I am with you that such frameworks are not production-ready
3- In my opinion, Such a framework can be useful if an easy way to modify the hidden prompts is available.
4- Such a framework can be useful if there is a manager agent (the only one that needs to have a strong LLM) and the other agents depend on a small LLM/s.. breaking the task into small "easy" tasks should make this able to be done using small LLM/s.. (Open source small LLM/s).
5- The custom tools availability of each agent should help (The agent who does the main search should be different from the agent who reads what is inside each result of the search) - Specialization leads to creativity - By this, we can add just for one agent a direction to force it to mention the URLs of info sources from which it built its answer..
6- I think no agent flow would be able to be truly autonomous as long as it does not have a self-reflection mechanism i.e. self-improvement mechanism.
7- When trials or expectations show that the result from one agent might be I think it would be better to add an agent just to summarize this result and replace the original result by the summarization in the workflow history.
Any way, Thanks for the good content. 🌹🌹🌹

HassanAllaham
Автор

I started using CrewAI one month ago and I detected all the same problems John detailed in the video, not having control over the flow of the software is hard to handle when you are use to design and implement complex solutions and algorithm, I think João (The founder of CrewAI) know it and thats why in the "Draft Gmail New Emails" example introduced lang graph to have a little more control over the flow but it doesn't solve the inefficient and unneeded token utilization. It is important to know that this problems are not exclusive of CrewAI, Autogen also suffer the same disease. The idea of developing with the future of LLMs in mind is something I didn't have in my radar, make total sense to me. Great video, keep working John!

elcaribbeannomad
Автор

Hi. This was incredibly helpful and useful. I've been watching a lot of YouTube videos on multi-agent workflows recently, and you are by far the best. You explain things very clearly and well, and make leaning these concepts much easier than all the others. Thanks man. Please keep these kinds of videos coming.

roccov
Автор

🎯 Key Takeaways for quick navigation:

00:30 *🧩 Explanation of Multihop Questions*
- Multihop questions are designed to be challenging by requiring preceding knowledge.
- Questions are structured as linear or parallel decompositions.
- Understanding the structure of multihop questions is essential for building effective agent workflows.
06:39 *🗂️ Overview of Agent Workflow in Crew AI*
- The agent workflow in Crew AI consists of a planning agent, search agent, integration agent, and reporting agent.
- Each agent has a distinct role in the workflow, from breaking down questions to organizing information and delivering responses.
- Feedback loops between agents help refine the investigation process and ensure accuracy in responses.
10:25 *🤖 Setting up Tasks and Descriptions in Crew AI*
- Tasks in Crew AI are assigned to specific agents and include detailed descriptions, expected outputs, tools required, and contextual information.
- Different agents have different responsibilities, such as conducting searches, organizing information, or delivering final responses.
- Providing clear and concise descriptions for tasks helps guide the behavior of each agent in the workflow.
23:29 *🤖 Types of workflows in Crew AI*
- Explaining sequential and hierarchical workflow structures in Crew AI.
- Highlighting the role of the manager LLM in hierarchical workflow.
- Differentiating between sequential and hierarchical operations.
24:47 *🔄 Testing multi-agent workflow in Crew AI*
- Setting up tasks in Crew AI for multi-agent workflow testing.
- Tracking OpenAI API usage costs before running the workflow.
- Comparing the speed of agent workflow in Crew AI to Autogen in a production scenario.
41:21 *💰 Cost analysis of complexity in question answering*
- Analyzing the cost of answering a two to three-hop question in Crew AI.
- Expressing concerns about the high cost of search operations in Crew AI.
- Discussing the potential challenges of using Crew AI in production due to cost implications.
48:08 *🤖 Issues with Crew AI*
- Crew AI lacks interpretability compared to autogen
- Inconsistency in the output of multi-agent workflows
- High cost for running workflows limits practical use cases
50:03 *🛠️ Pros and Cons of Crew AI*
- Easy setup for multi-agent workflows in Crew AI
- Suitable for experimentation and prototyping, not for production
- High cost and inconsistency are critical barriers to adopting Crew AI
50:30 *🔮 Future of Multi-Agent Frameworks*
- Multi-agent frameworks like Crew AI and autogen are limited to current model capabilities
- As language models improve, the need for multi-agent frameworks may diminish
- Custom workflows may be more beneficial for specific production applications

Made with HARPA AI

HarpaAI
Автор

This has been my experience with the frameworks. Going custom is where I’m likely going to end up

TheFocusedCoder
Автор

I just found you and subscribed. You came to the same conclusion I have. I dug into these frameworks and came to the same conclusion to build my own framework from the ground up for my use cases. The other part is that I use the Julia language.

Whiskeyo
Автор

I think this "agent swarms" idea is not the way to go, at all. It makes no sense. You don't need 15 agents to answer simple questions. I think the solution is to reduce use of LLMs for everything that can be solved with designated function calls. The way to go is to build functions that address desired use cases and use LLMs for summarization of results. If you combine knowledge databases, search engines, function calls and summarization, this can be done at a fraction of the cost. I can see a scenario where a single agent instance is well instructed to run a sequence of logical functions. Even if a few agents do it would be ok. In thisbcase you'd use a lot less tokens.

krisvq
Автор

Phi-3 through Ollama on a Pi5 using CrewAi, everything local and running acceptable
You need to also add prompt logic to reduce unnecessary searches.

NoCodeFilmmaker
Автор

Hey Data Centric, great video break down you have here. When these gpt models first came out from open AI, I thought I should be doing everything with AI all of the workflows I want to get done. But in fact I was wrong. Really you need to know the ins and outs of your workflow and you need to a way to force the system to produce reproducible outputs. This comes in narrowing the problem space or providing the system with solutions to mimic. I’m thinking about making a video about this to demonstrate or a short blog post of sorts. I don’t believe AGI will be the solve to all our problems narrow solutions to our individual problems is the way.

The best we could get with the next model from open AI is faster, cheaper inference + Better understanding of prompts so that we may simply declare what we want the AI to do like a person and pair this up heavily with automation of the parts of the system we know and understand well. Think of AI as a small bridge and not necessarily a car to take you somewhere

ARCAEDX
Автор

I think the high cost in your example is mainly due to the use of gpt-4.0-turbo and the use of a long set of agents against very simple questions that would not need multiple agents.
You can use an open LLM if cost is an issue. If speed is your main concern, I don't know.
Either way, we can learn from the open core of crewAI and build our own.

________
Автор

Unfortunately its just not possible to RELIABLY integrate these into complex business processes. You need to be able to make SMALL TWEAKS to the process without having to play the price is right with prompts. Cool concept. FAR from ready. Were not even CLOSE. the fact that people think so is overly optimistic. Ai right now is good for ONE SINGLE ACTION and alot of times it not even good at THAT since there is no way to LOCK IN a successful process.

tigreytigrey
Автор

Very good analysis! Could you please do a similar video on LangGraph?

RafiDude
Автор

Seems to me the main issues highlighted are 1. Value over direct prompts 2. Prompt engineering issues (specifically trying to get it to provide references in this case) 3. Cost 4. Latency.

1. You’re never going to get fresh results from an LLM: web search is essential for this
2. To me I don’t see much difference in the prompt engineering issues than you’d normally have. I wonder what prompt would result in references included? DSPy tries to automate this issue btw
3. Cost is on a downward trend. I’d love to know for example how claude/haiku performs, or llama-3-70b.
4. Latency: For sure it is a batch / offline / task switch scenario totally agree. For now. Try it with groq though. The LPU is 10x faster.

JulianHarris
Автор

Great walkthrough. Really like the level of details. Like the analysis and recommendations at the end. Awesome video. Keep up the great work.

Bana
Автор

I feel the reasoning example is more an llm test, than a kind of automation where you would use a multi agent framework

carinebruyndoncx
Автор

I have a similar simple set up while I play and learn. Start small, get it working. There are TON of things that arent common knowledge or are undocumented in crewai but it does work. And while each LLM gives different results even the same LLM will give different results for the exact same prompt, I have had successes with open source local LLMs.

The power of it is when you start getting more complex config. This is where you now need to be a software dev and data science grad. I am neither of these. So I find it extremely difficult hurdle to get over and I get stuck a lot.

tonyppe
Автор

Just the type of channel I’ve been looking for

strength
Автор

Thanks, that was a great explanation and tutorial. I could literally write 10, 000 words in response, but that would not be practical. I've studied all the trends in AI since 1977, but I've also spent 25 years studying cognitive neuroscience and related subjects, so I have a different take on the way AI should be implemented. My own personal research involves the creation of a brain inspired cognitive architecture. I'm considering the possibility of putting a very small language model at the core. The system would be designed to learn from experience, instead of being force fed the internet. Anyway, my experience with crew AI has been anything but good so far. Have you considered using something like Ludwig for fine-tuning and LoRAX for serving on your local system? If you could get that to work, you could save 95% of your expensive ChatGPT calls. You could use ChatGPT to create a custom Training Data Set for back propagation or PEFT of a smaller open source model. I'm looking forward to your future content. Thanks again.

andrewowens
Автор

Great video. Had success with some hobby projects using langgraph. Experimented with crewai, but felt exactly like you mentioned regarding loss of control

madhudson
Автор

What are your thoughts on Agency Swarm by VRSEN? I haven’t used the framework yet but the author claims you can customize all prompts, including the framework prompts. In the author’s videos he also claims autogen and crewai are not good for production, whereas agency swarm is. Would love to hear your evaluation/opinion about this framework.

benh