Phi 4 on Ollama - is it REALLY better than Claude 3.5?

Показать описание

PHI 4 is a new 14 billion parameter model from Microsoft that is claimed to beat GPT-4o, Llama 3.3 and Claude 3.5 in math, and is comparable in other benchmarks. What makes this model so good however (for it's size), is its chain of thoughts

In this video we get started with PHI 4 by getting up and running locally using Ollama and testing it withon OpenWebUI. We look at its math capabilities, it's coding capabilities, it's role playing capabilities and the chain of thoughts it generates for puzzles such as sudoku or tic-tac-toe, comparing with Deep Seek R1, Claude 3.5 Sonnect, GPT-4o-mini and Llama 3.3 in various tasks. We also look at how Microsoft is priming itself for agents but isn't quite there due to lack of tool calls.

00:00 - Introduction
00:55 - Installing PHI-4 with Ollama
02:12 - Using OpenWebUI
03:06 - Math Problems
06:48 - Sudoku
11:38 - Tic-Tac-Toe
15:05 - Function Calling and Tools
17:20 - Role Playing
18:55 - Creating Stories
21:09 - Coding
24:37 - Final Thoughts

Рекомендации по теме

Комментарии

please note at 13:46, mini got the answer right, as did llama3.3 at 14:11. i did point out earlier in the video, that position 2 is a correct answer. in the flow of the video i was focused on position 3, even though i had pointed out position 2 was also correct. you've all figured this for yourselves. i'm not editting this video as it doesn't change the narrative. apologies for missing this in the flow of the video

chrishayuk

Saved me time and heartache figuring this out the hard way. I can ask for nothing more. Thanks!

Junon

No tool calling? Qwen will they be able to?

patruff

Nemotron got on first try when responding the '254*752-3+(2-7)+5':

To evaluate this expression, we need to follow the order of operations, often remembered by the acronym PEMDAS (Parentheses, Exponents, Multiplication and Division from left to right, Addition and Subtraction from left to right). Here's how to break it down:

Given expression: `254 * 752 - 3 + (2 - 7) + 5`

1. **Parentheses first**: Solve the expression inside the parentheses.
- `(2 - 7) = -5`

2. **Update the expression**: Replace the solved parentheses with its result.
- `254 * 752 - 3 - 5 + 5`

3. **Multiplication next** (since there are no exponents):
- `254 * 752 = 191008`

4. **Update the expression again**:
- `191008 - 3 - 5 + 5`

5. **Finally, handle Addition and Subtraction from left to right**:
- `191008 - 3 = 191005`
- `191005 - 5 = 191000`
- `191000 + 5 = 191005`

So, the final result of the expression `254 * 752 - 3 + (2 - 7) + 5` is **191005**.

gustavo

FYI, on Mac at least, the BoltAI GUI app will connect to locally-running Ollama and LM Studio models served by those apps

pmarreck

what's 254*752-3+(2-7)+5? first convert it to simpler parts then start solve it
This model is very good at CoT, so if you make it think step by step, it gives you the right answer. For a better test of this model, you should make it break it down and think about it as much as possible.
so the System Prompt should be : always break down any questions into simpler parts, then solve it step by step.

seniormcyt

Google Fi, Phi, but when are they going to release Fo, and Fum? As an Englishman by blood please respond.

patruff

not yet available on huggingface library searching but if someone has a pull command from ollama please say

QuizmasterLaw

I think Sam Altman said “no one wants last years model” or the second best model. Unless it’s really faster or better most will stick to one of the others. Good to see though.

thegrumpydeveloper

in openweb ui how did you setup a dark mode? it is so useful

artoke

What version of OpenWebUI are you using?

stoicescucatalin

does OpenWebUI cost money? can it run *anything* ?

dot

If you're really good at using the model, could get a lot of work done with it, or does it have too many limitations and weaknesses?

Corteum

The 14b parameter model may give worse answers in test time compute as smaller models generally does not reason better vs larger models

FalconStudioWin

I used a Gen10 A2 GPU, and it actually did a really good job on the latest model. It used about 10GB NVRAM at most compared with the o1, and it was not far of a really good model

DrWaldonHendricks

The Ollama modal is quantized to Q4_K_M, so you will lose much quality, especially context. It is not a fair comparison. Also, you may need some guidance to force thinking which o1 and Claude 3.5 are doing but Ollama doesn't support yet.

At this point, Supernova Medius is better?🤔

husanaaulia

13:46 GPT4o mini got a RIGHT answer. position 2 is a winning move as much as 3 (you said it yourself earlier in the video and you mistakenly discarded the GPT4o mini answer as invalid.
Again on 14:11 you wrongly stated Llama 3.3 gets it wrong when it gets it right!
You had 1 job! Please fix the video or carry your reputation as a shill.

alelondon

Phi 4 on Ollama - is it REALLY better than Claude 3.5?

Phi 4 on Ollama - is it REALLY better than Claude 3.5?

Install Microsoft Phi-4 Locally and Test on Benchmarks

Local Phi-4 (14B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling

Download and Run Microsoft Phi 4 LLM Locally (Unofficial Release)

4. The Ollama Course - Using the CLI

Microsoft's Phi-4 Crushes ALL Other Mini AI Models!

My Framework for LLM Use Cases and AI Tooling (With Phi-4, Gemini 2.0, Llama 3.3)

Using Ollama to Run Local LLMs on the Raspberry Pi 5

RUN ARTIFICIAL INTELLIGENCE ON YOUR SYSTEM LOCALLY FOR FREE USING OLLAMA AND PHI-3 LLM 🔥

Run PHI-4, OLLAMA AnythingLLM on Mac Mini M4 Pro

Microsoft Phi-4 (14B) : This Opensource LLM is a MINI BEAST! The Best 14B Model YET! (Beats Qwen!)

Use Ollama to test Phi 3 on your local PC

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

How To RUN ChatGPT Like LLMs On Raspberry Pi 5 With OLLAMA (TinyLLaMA, PHI & More)

This new AI is powerful and uncensored… Let’s run it

How good is llama 3.2 REALLY? Ollama SLM & LLM Prompt Ranking (Qwen, Phi, Gemini Flash)

EASILY Train Llama 3 and Upload to Ollama.com (Must Know)

This Llama 3 is powerful and uncensored, let’s run it

Phi-3+ContinueDev+Ollama: STOP PAYING for Github's Copilot with this NEW, LOCAL & FREE Alte...

Ollama Course – Build AI Apps Locally

Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3

Ollama in a RASPI | Running a Large Language Model in a Raspberry Pi

I Ran ChatGPT on a Raspberry Pi Locally!

LLaVA 1.6 is here...but is it any good? (via Ollama)