New LLM BEATS LLaMA3 - Fully Tested

Показать описание

Qwen2 was released and I tested the biggest and smallest versions of it.

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? 📈

My Links 🔗

Media/Sponsorship Inquiries ✅

Links:

Disclosure:
I'm an investor in LMStudio

Рекомендации по теме

Комментарии

Which size of Qwen 2 will you be using?

matthew_berman

I suggest you retest LlaMA3. I tested it yesterday with several from your rubric. It aced the killers test, the marble test, and the apple test. After it passed the apple test, I threw it a twist by prompting: Give me 10 sentences with the third word in each sentence being the result of mixing red and blue colors. It used purple as the third word in all 10 sentences.

JustinArut

Protip: if you change the system prompt midway through a conversation it'll break the output. When changing the prompt start a new convo with no history

sckpuppt

The larger model's method of getting every sentance to end with Apple was pretty funny if you taker a closer look 😂

meh

Apple fooled Mat... It's at the end of each sentence but in some childish way!

hotbit

Always excellent reviews from your channel. Thank you.

TroyDoesAI

Congrats! on the new Dell Precision 5860 Tower I had kind of given up on seeing LLM reviews run on windows on your channel. I hope that this is a worry of the past! Post the Full Specs if you would to give us an idea of the performance of these models on this machine!

RWS

I'm more impressed by the 0.5b model than the large one. If you compare this with the previous models these answers are amazing.

jamesjonnes

The small model gives such bad answers, it's actually funny

zenn

MBPP -- (mostly basic python programs).
A14B -- It is a 57 billion parameter LLM with an additional 14 billion parameter vision module (hence the "A14B" in the name), making it a multimodal model capable of processing both text and images.

zeloguy

Will the uncensored version know about Tiananmen?

patrickwasp

57B-A14B means "57B parameters" (how much memory you need) "14B active parameters" (how fast it goes). Would be the same as Mixtral 8x7B, in terms of memory requirements and speed.

..

Hey Mathew.
I have noticed that these models format their responses in latex. And it is correct latex code. Could it be a setting in your gui that you can edit to inorive the response.

This is a defect of your gui I believe not the model

mbalireshawal

this was the best answer for the shirt problem so far

edsonjr

Thanks, I try it, It's awesome ❤

AliAlias

Thank you Matthew. I wish you could have tested the 7B and/or the MoE 57B since those can be run locally for many users.

abdelhakkhalil

I still maintain that the marble in the glass question needs to add a description to be more like "glass with an open top". I think as human's we're assuming that the word "glass" means a drinking container that is open on top but I don't think you can assume that a language model will default to that.

davidlavin

I'm going to take a wild guess and say that most of your followers don't have a GPU that can run the 72b model at a reasonable speed, and the 0.5b model is so bad as to not be very useful. I would prioritize the 7b model since more people are likely to use that one.

dr_harrington

15:45 it follows instructions very well even if it's incapable of forming a proper sentence with an apple at the end.

homematvej

8:53 - when someone told you the answer and your teacher said "explain your working out" lol

tonyppe

New LLM BEATS LLaMA3 - Fully Tested

New LLM BEATS LLaMA3 - Fully Tested

New Llama 3.1 is The Most Powerful Open AI Model Ever! (Beats GPT-4)

How Did Llama-3 Beat Models x200 Its Size?

Llama 3.1 405b Deep Dive | The Best LLM is now Open Source

Llama-3.1 Engineer : This Coding Agent can Generate Applications, But can it beat Aider? (w/ Ollama)

Perplexica + Llama-3.1 (405B, 70B, 8B) : This LOCAL & FREE CLONE of Perplexity BEATS Everyone!

Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & ...

Zuck's new Llama is a beast

groq supercharges fast ai inference for meta llama 3.1 (open source gpt-4o)

Llama 3.1 Is A Huge Leap Forward for AI

META's New Code LLaMA 70b BEATS GPT4 At Coding (Open Source)

LLaMA 405b is here! Open-source is now FRONTIER!

Microsoft Phi-3.1 Mini (3.8B) : Phi-3 Mini LLM just got an INSANE UPGRADE (Beats Llama-3 & Qwen2...

Qwen-2 : The BEST Opensource LLM is here & It's Amazing! (Beats Llama-3, GPT-4O, Claude)

How to use Llama 3(70B) API for FREE (beats GPT4 for business!)

DeepSeek-Coder-v2: The BEST Opensource Coding LLM! (Beats GPT-4o and Claude 3.5 Sonnet)

Aider + Llama 3.1: Develop a Full-stack App Without Writing ANY Code!

Zuckerberg Open Sources LLaMA 405B! Beats ChatGPT!!

Llama 3.1 | The Best LLM is now Open Source | TRY Locally & Online

Mistral Large 2 Beats Llama 3.1 and Claude 3.5 (But Not GPT-4o) | Sam Altman Elon Musk

Qwen-2 : The BEST Opensource LLM is here! (Beats Llama-3, GPT-4O, Claude)

InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested)

Mistral's Codestral (22B): The BEST Opensource CODING LLM is here! (Beats Llama 3, GPT-4O &...

LLAMA-3.1 405B: Open Source AI Is the Path Forward