Integral Challenge: Can These Cutting-Edge LLMs Solve It?

preview_player
Показать описание
I challenged OpenAI's ChatGPT 4o, o1-preview, o1-mini, Gemini Advanced, Grok 2, and Llama 3.1 to solve this innocent looking integral.
Рекомендации по теме
Комментарии
Автор

o1 mini is very strong. everyone in the competitive programming community is talking about it

caiodavi
Автор

Thanks for these videos please keep doing more of these you are doing a tremendous service to community and scientific education. I highly recommend doing collab with youtuber in other domains too!!

akshaybhat
Автор

We need o1 to be more intelligent than the most intelligent medical doctor, engineer, scientist, mathematician, architect, philosopher, historian in the universe!!!

h-e-acc
Автор

Love these math and astronomy tests on the o1 models.

LiddellUFC
Автор

I think o1-preview was watching this video, it just got it right.

Diamonddavej
Автор

o1 mini was better than o1 preview and equally as good as o1 in math according to the AIME

ckq
Автор

That was a tragic start. The integral is a 20 second job with Glasser's master theorem, and a 45 second job with residues.

xinpingdonohoe
Автор

This is a great format, short video and an interesting challenge. I tried the same prompt on two small local LLMs, Mathstral (a 7B model from Mistral) and Qwen2.5-math (I used the 70B model quantised to Q4 from Alibaba)

Qwen2.5-math gave "\boxed{69420\cdot \frac{\pi}{2\sqrt{2}}}", it split the integral but made mistakes.

Mathstral gave after trying a partial fraction decomposition, this ran very quickly, locally on my laptop.

I wonder if the unquantised version of Qwen2.5 would have nailed it.

johntdavies
Автор

Kyle, its high time for a new live stream. Eagerly waiting for more tests with o1.

rickandelon
Автор

I think 01 mini is doing more thinking, it's just a smaller model

PrinceCyborg
Автор

Funny how the smaller o1 mini got it right. I am convinced the o1 mini is more refined. And o1 preview is just so experimental and incomplete

llsamtapaill-ocsh
Автор

This is a fun way to compare programs... do more of these with different problems. Also maybe ask them to write some fairly simple python code for some random problems.

dannyl
Автор

Every LLM could easily write the Python code to solve the integral.

Linshark
Автор

Well, I was surprised that o1 did the integral of
sqrt(r/(r-r_s)) dr so effortlessly, which is the integral used to find the proper length in the Schwarzschild metric of relativity, by the way, the hard one.
But struggling with such an easier one is hilarious.
It is so hilarious to see that o1 mini beats o1 reviews, lol.

eigenvector
Автор

There's must be something special about 69420. Is it a secret door to something that openai doesn't want to reveal? You can tell it a hundred times and it still sticks to 6940😂

loc
Автор

So are the models doing this internally? Or is it accessing Wolfram or other calculation tools to accomplish this?

ThreeChe
Автор

Tbh id consider o1 previews answer to be correct as it was only 0.0015% off. I mean it is pretty much the closest to the correct answer. The rest are off by a long shot.

GodbornNoven
Автор

I remember like 10 years ago wolfram alpha used to get all those problems solved like a pro

alefermin
Автор

Please correct me When i learned integration in school and college there used to be a constant C for all the integrations ... How is there no integration constants in this ...

parthasarathyvenkatadri
Автор

Cool, one model make it .And This is the Big Issue with AI : Yuo Never known when the anwser is Correct, yuo need to check

noway