OpenAI's GPT-Mini, Column-R & U, Google's Eureka : FULLY TESTED (Secret LLMs on LMSYS Arena)

preview_player
Показать описание

-------------
Recently, 4 Secret LLMs were dropped on LMSYS Arena including a GPT-Mini model from OpenAI, Column-R & U model from Cohere and Eureka from Google. Today, I'll be testing them to find out which models performs good and which doesn't. I'll also be testing if any of these models can beat the current best Claude-3.5 Sonnet, GPT-4O, DeepSeek-Coder-V2, Qwen-2 and others. I'll be testing these LLMs from LMSys Arena for FREE.

------------
Key Takeaways:

📈 Four New AI Models Released on LMSys! Discover the latest AI advancements as LMSys adds GPT-Mini, Column-R, Column-U, and Eureka Chatbot, pushing the boundaries of artificial intelligence technology.

🤖 GPT-4O's Sneaky Pre-Launch! Learn how GPT-4O was tested in LMSys Arena under the disguise of Good-GPT-2-Chatbot, showcasing the hidden processes behind AI model development.

🧩 Who Made These AI Models? Dive into the origins of these new AI models. OpenAI brings us GPT-Mini, Google is behind Eureka Chatbot, and Cohere seems to be the creator of Column-R and Column-U. Unravel the mystery with us!

🧪 Performance Testing Results! Watch as we rigorously test these models with challenging questions and coding tasks, revealing the strengths and weaknesses of each AI model in practical scenarios.

🚀 Column-R: The New AI Champion? Column-R stands out with impressive performance, passing most of our tests. Could this be the next big thing in AI, potentially overshadowing models like Sonnet?

📊 Final Verdict & Recommendations! Get the ultimate breakdown and comparison of these AI models. Find out why GPT-Mini, Column-R, and Column-U are worth watching, and why Eureka might not be up to the mark.

----------
Timestamps:

00:00 - Introduction
00:13 - New Secret Models on LMSYS (GPT-Mini, Column-R & U, Eureka)
01:48 - OnDemand (Sponsor)
02:53 - Checking Model Origins
04:33 - Testing All Models (9 Questions)
04:55 - Question 1
05:28 - Question 2
06:01 - Question 3
06:41 - Question 4
07:07 - Question 5
07:53 - Question 6
08:31 - Question 7
09:07 - Question 8
09:45 - Question 9
10:14 - Final Conclusion of the New LLMs
Рекомендации по теме
Комментарии
Автор

please make a public spreadsheet with all the results of your test per models it'd be a great ressource to have ^^

DADLOLO
Автор

Have you looked at Websim yet? I dont recall a vid on that. Keep up the great work. Be in peace God speed.

clint
Автор

Eureka. The ultimate trash 😀...that was funny

ganian
Автор

Google is pretty screwed up. Gemma 2 is also pretty trash.

slotsmaster
Автор

Hoping that next video should be Wowfull..🤞..we want BIG things on AI locally....😊😊

bharathreddy
Автор

In the past, I commented on your voice. It's still a lil strange.
Tho, it's grown on me. ❤️
It's definitely branded! Lol
Anyway...

caseyhoward