How good is llama 3.2 REALLY? Ollama SLM & LLM Prompt Ranking (Qwen, Phi, Gemini Flash)

Показать описание

🚨 Llama 3.2 Is Here... but how good is it REALLY? How good is any small language model? 🚨

🔗 Resources:

🔥 Small Language Models (SLMs) are heating up
In this video, we dive deep into Meta's Llama 3.2 3B and 1B parameter models and evaluate whether small language models are ready to rival the big players in the LLM arena. Using Ollama and Marimo, we compare the performance of Llama 3.2 against models like GPT-4o-mini, Sonnet, Qwen, Phi, and Gemini Flash. Are SLMs like Llama 3.2 finally good enough for your projects? Let's find out!

🔍 Hands-On Comparisons Beat Benchmarks Any Day!
We run multiple prompts across multiple models, showcasing real-world tests that go beyond synthetic benchmarks. From code generation to natural language processing, see how Llama 3.2 stacks up. Discover the surprising capabilities of small language models and how they might just be the game-changer you've been waiting for.

🛠 Tools to Empower Your AI Journey
We'll also explore how tools like Ollama and Marimo make it easier than ever to experiment with small language models on your local device. Whether you're into prompt testing, benchmarks, or prompt ranking, these tools are essential for maximizing your AI projects and understanding what small language models can do for you.

Join us as we uncover whether SLMs like Llama 3.2 are truly ready to take on the giants of the LLM world. If you've been curious about the latest in prompt testing, benchmarks, and prompt ranking, this is the video for you!

📖 Chapters
00:00 Small Language Models are getting better
00:40 How good is llama 3.2 REALLY?
01:17 Multiple Prompts on Multiple Models
08:32 Phi, Llama, Qwen, Sonnet, Gemini Flash model voting
13:53 Hands on comparisons beat Benchmarks anyday
18:38 SLMs are good, not great but they are getting there

#promptengineering #softwareengineer #aiengineering

IndyDevDan

Рекомендации по теме

Комментарии

Thanks for including generation of SQL queries among the tested tasks. The ability of models to interface with databases is crucial.

johnkintree

Thank you for continuing to post great content

techfren

Thanks for continuing this series - it's been super helpful

shockwavemasta

@IndyDevDan - you da man, dan. experienced engineers can appreciate your methodology and the value of your content and the tools you create. inexperienced engineers can learn the value of a methodical, structured approach to software development, which includes analyzing, comparing, and building tools to maximize your productivity. great videos. keep 'em coming.

billydoughty

THANK YOU! I really appreciate your honest testing and taking us along with you on this journey!

zkiyyeller

Would be cool to test image understanding. Basic OCR to start with then counting objects and doing reasoning over the images. LLM providers often tell us what their models can't do, or can't do well. Use that info as a signal of improvement would be very useful IMHO. Best still is that you can use code to check exactly how correct each model is, this can be harder when dealing with text where you need a human judge or an LLM as a judge (which then needs to be aligned with a human anyway). Also thanks for the video, I check in every Monday. Keep on keeping on. 👍

ariramkilowan

Wow. nice. What I am missing is technical metrics for comparison, like response time, memory used to run the model...

peciHilux

I wish that you put the model parameter sizes in the video description. Makes it easier to really give weight to your comparisons when you're comparing a 1B model to a 7B model

Jason-judf

4-way gold medal of 7 contestants means you need harder questions at the top end to separate them out.

pubfixture

Great video, thank you! Creative, using a custom notebook for benchmarking/comparisons. 💯✨️

enthusiast

Interesting project.Since I am a lazy person, I will use another LLM model to score the output each time rather than manually.

zakkyang

Lots of subs to be had in the SLM area, so many edge cases. Try 70b_q4 compared to 8b models.

aerotheory

Great comparison, thanks for making this! I'm off to compare qwen2.5:latest with qwen2.5-coder:latest.

amitkot

What quantization sizes where you using for the models??
Love your channel! Keep it coming !!!

billybob

Hands down before local model I have seen for function/tool calling.

DanielBowne

I found it hard to understand how you benched the models. Was this mostly down to personal opinion? Maybe you might explain your tests before discussing the results.

Your test tooling looks really nice!

davidpower

I know you don't do much of model training on this channel. But have you considered training the some of the local models on your good test results then seeing how the refined models perform?

CheekoVids

Thanks for the video! Could you make a tutorial in which a local installation of Llama can learn from the chats you have with the IA. I mean you just talk and somehow it is storing this information internally and not losing it when you close the computer.

samsaraAI

I'm curious do you use 5k context in ollama default model right?

NLPprompter

What about testing using law context, i found different models give different respond and sometimes they absolutely halucination

ibrahims

How good is llama 3.2 REALLY? Ollama SLM & LLM Prompt Ranking (Qwen, Phi, Gemini Flash)

Zuck's new Llama is a beast

How Did Llama-3 Beat Models x200 Its Size?

Meta AI Llama 3 Explained (in 3 Minutes!)

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT

Llama 3.1 is ACTUALLY really good! (and open source)

Llama 3.2 is HERE and has VISION 👀

Llama 3.2 3b Review Self Hosted Ai Testing on Ollama - Open Source LLM Review

Getting Started With Meta Llama 3.2 And its Variants With Groq And Huggingface

RAKITRU - Jose Vasquez, Mafe Vasquez

Meta Llama 3.1 is Game Over for GPT 4o ❓

Meta’s NEW Llama 3.3 70B is Absolutely INSANE

How to Run Llama 3 Locally? 🦙

Llama 3.2 goes Multimodal and to the Edge

LLaMA 3 UNCENSORED 🥸 It Answers ANY Question

Build Anything with Llama 3 Agents, Here’s How

Metas LLAMA 3 Just STUNNED Everyone! (Open Source GPT-4)

'okay, but I want Llama 3 for my specific use case' - Here's how

LLAMA 3.1 70b GPU Requirements (FP32, FP16, INT8 and INT4)

Meta releases Llama 3: Mark Zuckerberg Announces

Llama 3.3 70B in 5 Minutes

This Llama 3 is powerful and uncensored, let’s run it

Gemma-2 : Google's NEW Opensource Model can beat Llama-3 & Others in Benchmarks (Fully Test...

New Llama 3.3 Shocks the AI World - Crushes GPT-4 and Costs Almost Nothing

Meta Llama 3 Is Here- And It Will Rule the Open Source LLM Models