OpenAI's GPT-Mini, Column-R & U, Google's Eureka : FULLY TESTED (Secret LLMs on LMSYS Arena)

Показать описание

-------------
Recently, 4 Secret LLMs were dropped on LMSYS Arena including a GPT-Mini model from OpenAI, Column-R & U model from Cohere and Eureka from Google. Today, I'll be testing them to find out which models performs good and which doesn't. I'll also be testing if any of these models can beat the current best Claude-3.5 Sonnet, GPT-4O, DeepSeek-Coder-V2, Qwen-2 and others. I'll be testing these LLMs from LMSys Arena for FREE.

------------
Key Takeaways:

📈 Four New AI Models Released on LMSys! Discover the latest AI advancements as LMSys adds GPT-Mini, Column-R, Column-U, and Eureka Chatbot, pushing the boundaries of artificial intelligence technology.

🤖 GPT-4O's Sneaky Pre-Launch! Learn how GPT-4O was tested in LMSys Arena under the disguise of Good-GPT-2-Chatbot, showcasing the hidden processes behind AI model development.

🧩 Who Made These AI Models? Dive into the origins of these new AI models. OpenAI brings us GPT-Mini, Google is behind Eureka Chatbot, and Cohere seems to be the creator of Column-R and Column-U. Unravel the mystery with us!

🧪 Performance Testing Results! Watch as we rigorously test these models with challenging questions and coding tasks, revealing the strengths and weaknesses of each AI model in practical scenarios.

🚀 Column-R: The New AI Champion? Column-R stands out with impressive performance, passing most of our tests. Could this be the next big thing in AI, potentially overshadowing models like Sonnet?

📊 Final Verdict & Recommendations! Get the ultimate breakdown and comparison of these AI models. Find out why GPT-Mini, Column-R, and Column-U are worth watching, and why Eureka might not be up to the mark.

----------
Timestamps:

00:00 - Introduction
00:13 - New Secret Models on LMSYS (GPT-Mini, Column-R & U, Eureka)
01:48 - OnDemand (Sponsor)
02:53 - Checking Model Origins
04:33 - Testing All Models (9 Questions)
04:55 - Question 1
05:28 - Question 2
06:01 - Question 3
06:41 - Question 4
07:07 - Question 5
07:53 - Question 6
08:31 - Question 7
09:07 - Question 8
09:45 - Question 9
10:14 - Final Conclusion of the New LLMs