Introducing The New Champion of Function Calling!

preview_player
Показать описание
In this video I go through the new open Tool Use / Function Calling model which come from Groq and Glaive and is based on the Llama-3 models.

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Groq Tool Use X (Twitter) Post
00:13 Groq Tool Use Blog
00:52 Berkeley Function Calling Leaderboard
02:36 Glaive AI
03:58 Code Time
14:27 Groq Hugging Face
Рекомендации по теме
Комментарии
Автор

There is no rest when you're in this industry. Always some part of the tech stack that's developed, some new feature. Thanks for covering the best bits!

thetagang
Автор

Wow! How exciting! Man you're my hero Sam. You are literally 8 steps ahead of the curve.

SirajFlorida
Автор

I wish they would release a mixture of agents option for people to use natively through their API. I have my own setup I can use, but I see a lot of people using LLMs who dont have the ability to do that.
Function calling has great utility, but any model can do this. If you give it the tool list with definition and the schema to use and give it a few examples in your messages array of a back and forth user and assistant messages that show the assistant using them in various scenarios most decent models will do really well with using them. In places where you're 100% sure it should be using at least one tool, you simply pair this with a function that just re-asks the same question recursively until you parse the response you know you're looking for.

mitchellmigala
Автор

From my limited testing, it's significantly more prone to hallucinations than gpt family of models that I've been using (it hallucinates argument values, creates argument values out of thin air, and even creates new functions). For my use case, even gpt-3.5-turbo and the vanilla version of llama3 that they're hosting is doing better on my custom evals than this new one, which is honestly kinda disappointing. I'm starting to feel like those benchmarks are not as good of a source of evaluation as they're wanting us to believe.

jcksn
Автор

I don't think they'll release the dataset, as Groq wants to keep it as a competitive advantage to increase their developer base. Anyway, you mentioned query rewriting, so let me share something. You know, from my actual production experience, it's too bold to release software with function calling without query rewriting. Recently, in a project where we needed function calling and tried many models, we faced unpredictability. Instead of fine-tuning those models, we fine-tuned GPT-2 specifically for query rewriting using synthetic data tailored to our case. And voila! Once we implemented that, all the nuances and unpredictability were gone. Query rewriting, either using a strong model or our approach, allows for effective use of many language models supporting function calling without fine-tuning the entire model. Like in your last example, with or without the keyword "search, " query rewriting is definitely one of the best steps in the pipeline.

unclecode
Автор

Thanks for the video, an interesting model. Am I right in thinking that what this model is good at is actually extracting data from a text to make properly formatted input data to tool calls, but weaker in making the decision to call a tool or not? Like you showed with your "(search) when do the olympics start" example, I was a bit surprised that a 70b model couldn't get that one. I see they also mention this in their blog post, a hybrid/routing approach. It would be interesting to see the benchmarks/performance if the models were allowed such a "reasoning layer" on top.

ringpolitiet
Автор

In my local testing, it seems Llama 3 8b is already pretty good for function calling (couldn't find cases where it fails)


Would be interesting to see in which function calling cases these high performing FC models succeed while Llama 3 from Meta fails.

tpadilha
Автор

we can still fine tune it further right?
would take make a difference?

sanchaythalnerkar
Автор

I think phidata does the best open source function calling

teddyfulk
Автор

I really dont understand why we need this? cant you just send a prompt to the LLM, "calculate this formula and return the result in json format
[ {
"formula": "",
"result": ""
} ]
why do we complicate things with a lot of text that 100% you will have typo somewhere and you will spend hours finding that typo, to achieve what exactly???

hqcart
Автор

This model is trash, I’m sorry but whoever did the benchmarking needs to be fired. It fails on every 3-4 calls quite regularly. It’s ok for super super simple function calls and it’s no better than the base Llama 3 model. Thumbs down on this model for me.

davidrobertson