Quick Introduction to Fast Tool Calling with Groq and Llama3

preview_player
Показать описание
In this video we'll look at tool calling with Groq's fine tuned Llama 3 model specialized in for function calling. We'll conduct some simple experiments profiling Llama 3 models, comparing Llama3 70B and 8B models for latency given a simple prompt. At the end we'll also take a quick look at the super cool demo by Groq that combines fast audio transcription with Whisper Large v3 and Groq's optimized llama3.1 model (released recently by Meta) for a quick look into the future of low latency LLMs!

📚 Chapters:

00:00 - Introduction and Overview
00:29 - Setting Up the Environment and Project
01:05 - Defining the Function and Setting Up the Model
02:29 - Handling Messages and Tool Calls
03:52 - Running and Compiling Responses
04:47 - Testing Our Setup with a Simple Calculation
05:26 - Experimenting with Multiple Models
07:28 - Profiling Latency for Model Comparison
09:00 - Reviewing the Profiler Function
10:04 - Running and Comparing Multiple Tests
12:55 - Single Test Profiling and Adjustment
15:59 - Exploring Additional Tools and Fast Inference Demo
16:51 - Final Thoughts and Speed Evaluation

🔗 Links

Support the Channel!
Рекомендации по теме
Комментарии
Автор

Nice video. Like that you walk through in real time... including showing how much co-pilot currently sucks.

IdPreferNot
Автор

What size of Llama3 are you using? At which point it starts not being useful? I really struggle to decide which size to use.

Great video!

AlexandreAugustin