Integral Challenge: Can These Cutting-Edge LLMs Solve It?

Показать описание

I challenged OpenAI's ChatGPT 4o, o1-preview, o1-mini, Gemini Advanced, Grok 2, and Llama 3.1 to solve this innocent looking integral.

Рекомендации по теме

Комментарии

o1 mini is very strong. everyone in the competitive programming community is talking about it

caiodavi

Thanks for these videos please keep doing more of these you are doing a tremendous service to community and scientific education. I highly recommend doing collab with youtuber in other domains too!!

akshaybhat

We need o1 to be more intelligent than the most intelligent medical doctor, engineer, scientist, mathematician, architect, philosopher, historian in the universe!!!

h-e-acc

Love these math and astronomy tests on the o1 models.

LiddellUFC

I think o1-preview was watching this video, it just got it right.

Diamonddavej

o1 mini was better than o1 preview and equally as good as o1 in math according to the AIME

ckq

That was a tragic start. The integral is a 20 second job with Glasser's master theorem, and a 45 second job with residues.

xinpingdonohoe

This is a great format, short video and an interesting challenge. I tried the same prompt on two small local LLMs, Mathstral (a 7B model from Mistral) and Qwen2.5-math (I used the 70B model quantised to Q4 from Alibaba)

Qwen2.5-math gave "\boxed{69420\cdot \frac{\pi}{2\sqrt{2}}}", it split the integral but made mistakes.

Mathstral gave after trying a partial fraction decomposition, this ran very quickly, locally on my laptop.

I wonder if the unquantised version of Qwen2.5 would have nailed it.

johntdavies

Kyle, its high time for a new live stream. Eagerly waiting for more tests with o1.

rickandelon

I think 01 mini is doing more thinking, it's just a smaller model

PrinceCyborg

Funny how the smaller o1 mini got it right. I am convinced the o1 mini is more refined. And o1 preview is just so experimental and incomplete

llsamtapaill-ocsh

This is a fun way to compare programs... do more of these with different problems. Also maybe ask them to write some fairly simple python code for some random problems.

dannyl

Every LLM could easily write the Python code to solve the integral.

Linshark

Well, I was surprised that o1 did the integral of
sqrt(r/(r-r_s)) dr so effortlessly, which is the integral used to find the proper length in the Schwarzschild metric of relativity, by the way, the hard one.
But struggling with such an easier one is hilarious.
It is so hilarious to see that o1 mini beats o1 reviews, lol.

eigenvector

There's must be something special about 69420. Is it a secret door to something that openai doesn't want to reveal? You can tell it a hundred times and it still sticks to 6940😂

loc

So are the models doing this internally? Or is it accessing Wolfram or other calculation tools to accomplish this?

ThreeChe

Tbh id consider o1 previews answer to be correct as it was only 0.0015% off. I mean it is pretty much the closest to the correct answer. The rest are off by a long shot.

GodbornNoven

I remember like 10 years ago wolfram alpha used to get all those problems solved like a pro

alefermin

Please correct me When i learned integration in school and college there used to be a constant C for all the integrations ... How is there no integration constants in this ...

parthasarathyvenkatadri

Cool, one model make it .And This is the Big Issue with AI : Yuo Never known when the anwser is Correct, yuo need to check

noway

Integral Challenge: Can These Cutting-Edge LLMs Solve It?

Integral Challenge: Can These Cutting-Edge LLMs Solve It?

Sundar Pichai Challenges Satya Nadella to an AI Showdown | All You Need to Know

Struggling to integrate cutting-edge AI into your operations?

Philippine Navy Receives Cutting-Edge Destroyer

Powering the Future: The Battery Integration Challenge

Unified Namespace for Data Integration in Smart Manufacturing

🎓 Multi-Omics Data Integration: Can You Get All 5 Trivia Questions Right?

Strategic AI Integration

The Bizarre Free Fire Fan Communities

Batista and Edge engage in a war of words on 'The Cutting Edge': SmackDown, June 1, 2007

Tamasha Dekho 😂 IITian Rocks Relatives Shock 😂😂😂 #JEEShorts #JEE #Shorts

Microsoft's A.I. Integration Takes Search Engine Game to the Next Level, Challenges Google

Unleash the Power of Integration AI Content Creation Revolution #dalle3 #theaichannel #chatgpt

Is Orthobiologic Hard To Integrate In Practice? Easy Steps To Success

Minecraft - Cutting Edge Tool for Integral Education - Ms. Dina Cohen

Turkish KAAN Fighter Jet Set to Challenge US F-35 as Next Generation Air Superiority Platform

Division Newport's development, integration of cutting-edge payloads crucial to undersea warfar...

Cutting-Edge Continuous Integration with Dagger - Paul Dragoonis

Unlocking Business Success with AI Integration: Insights from an AI Consultant

Implement Continuous Integration #softwaretesting #mobiletesting

AI Updates: OpenAI and iPhone Integration, SORA Competitor, AI Explorer, RealFill, and Beyond!

[ELE Highlights] Embracing Cutting-Edge Assessments for Seamless Skills Integration!

SOCRATIC AI by Google DeepMind Just BROKE LIMITS – Learning TOO FAST

How Giant Heavy Workpiece Is Made. Incredible Casting Process And CNC Machine In Working