ChatGPT can't do math...

preview_player
Показать описание

Against my better judgement, I decided to give ChatGPT another chance to solve a maths exam. This is the 2023 British Mathematics Olympiad Round 1. Has it improved since last time?

You can also follow Tom on Facebook, Twitter and Instagram @tomrocksmaths.

A HUGE thank you to all of my patrons for their support:
Dr Peet Morris
Jeryagor
John Hanson
Rodhern
Denise
Cooper Healy
Hiro
Delicious Rose
Bin Liu

Рекомендации по теме
Комментарии
Автор

Chatgpt doing black magic instead of geometry.

nofilkhan
Автор

ChatGPT invoked the Illuminati on the Geometry question 😂

narutochan
Автор

the geometry drawing it produced had me gasping for air 🤣

bekabex
Автор

The problem is that chatgpt or any llm, they are not applying formal logic or arithmetic to a problem, instead they regurgitate a solution they tokenized from their training set, and try to morph the solution and the answer in the context of the question being asked. Therefore, just like a cheater, it can often give a correct result confidently because it has memorised that exact question, sometimes it can even substitute values into the result to appear to have calculated it, but in the end it's all smoke and mirrors. It didn't do the math, it didin't think through the problem, that's why llm's crumble when never before seen questions get asked, because an llm has no understanding, only memorisation. Also llms crumble when irrelevant information is fed alongside the question, because the irrelevant information impacts the search space that's being looked at, so accuracy of recall is reduced.

LLM's do not think, they do not process information logically, rather they process input and throw out the most likely output, and use some value substitution in the result to appear to be answering your exact question.

LLM's cannot do mathematics, at best they can spit out likely solutions to to your questions where similar or those exact questions and their solutions have been fed to them in their training set. An LLM knows everything and understands nothing.

EaglePL
Автор

I once asked chatGPT to prove that π is irrational. It gave back the proof of √2 problem, discuss squaring the circle problem and in final conclusion wrote hence π is irrational.

shoryaprakash
Автор

I feel like ChatGPT may have taken your first message to be meant as a compliment rather than as a prompt that it should pretend to be you.

Lightning_Lance
Автор

It becomes obvious that the language model is essentially a separate module to the image generator. I bet even if the solution had been flawlessly found, the drawing of a diagram would be completely bonkers

tymmiara
Автор

Question 3 the geometry one ends up much better when you give it the graph with the instructions. I tried it and got a much better result. To do this I used the snipping tool to make an image of both the question and the graph. Then I saved it to desktop as screenshot.jpg and dragged that into the ChatGPT window. It read them both fine.

yagodarkmoon
Автор

ChatGPT has, on multiple occasions, told me that odd numbers were even and vice versa

TheDwarvenForge
Автор

20:42 power of a point is a basic geometry theorem...

toshiv-yl
Автор

Tom is not locked in. Every uni maths student knows if you take a picture of the question it will always give you the right answer

loadstone
Автор

Power of a point is actually real and while I’m usually bad in geometry at olympiads, some of my friends used it several times.

rostcraft
Автор

19:35 LOL, the diagram drawing looks like equal parts 1) M.C.Escher, 2) Indian Head test pattern from the early days of television, 3) steampunk, 4) Vitruvian Man. It's all sorts of incorrect, its confidence is a barrel of laughs, but it's lovely to look at and fun to contemplate how ChatGPT may have come up with that. My favorite part is the top center A with the additional 'side shield' A, and honorable mention to how the matchsticks of the equilateral triangle have three-dimensional depth and shadows.

gtziavelis
Автор

On an unrelated note, I remember sitting this BMO paper last year and struggling but enjoying it. I recently started uni in Canada and have been training for putnam, and now I’m looking back at these questions both cringing and being proud at how much I’ve grown in just a year, how I’ve gone from finding these questions tough, to now being able to solve them without much struggle. This is what I love about maths, how I can always continue with just some practice. P.s, great video Tom, really enjoyed watching it.

abdulllllahhh
Автор

Cool video and all but are you aware of o1-mini and o1-preview???

Hankyone
Автор

Our jobs are safe, ChatGPT can’t do maths at all.

jppereyra
Автор

In Q1 there seems to be an error in chatgpt's explanation. For example, it says "D" must be in position 7, 8 or 9 but "DOLYMPIAS" is a valid misspelling...every letter is one late, except for D (early) and S (correct).

dmytryk
Автор

I've only watched up to the first question so far, but I came up with a different solution that's interesting enough to mention. Another way to think of the problem is dividing the characters into 2 subsets, one of them is the characters that were typed 1 late and the other is all the others that weren't. If all the characters are different, these 2 sets give enough information to reconstruct any possible spellings. Therefore, we just need to count all the ways to make these subsets.

We know that in an n character long word the last character can never be 1 late. So we only have n-1 letters left to work with. [n-1 choose k] will give us a k sized subset. To get all possible subsets, we need to sum up for every case of k.

[sum(k = 0..n-1)(n-1 choose k)]

This is the n-1st row of Pascal's triangle. We know that the sum of the n-1st row of it is 2^(n-1). The word "OLYMPIADS" has 9 letters, therefore the answer is 2^8 which is 256.

gergoturan
Автор

we can add shape through the attachment icon right in the left corner of the prompt box, just take a Screenshot figure and put forward like this.

JavairiaAqdas
Автор

Hey Dr Crawford - thank you for your video and insight. It seems that you are using the basic GPT4 model to solve these BMO questions. There is a different model ChatGPT provides called the o1-preview, which is specifically designed for complex and advanced reasoning and solving difficult mathematical questions like this. If you use the o1-preview model, it would take way longer time (sometimes even more than a minute) before giving you a response, and it thinks in a way deeper way than the model you have used here. With that model, I've tried feeding it questions 5 and 6 on the BMO1 paper, and it could solve them perfectly.

Therefore I would encourage you to try again with that specific model. I do believe that you have to have ChatGPT subscription to access that model, but I think that they are going to release a free version of that model. Anyways, thanks you so much!

P.S. It would have been better if you simply uploaded a screenshot of the question as diagrams could have been included, and ChatGPT would be able to read the question from the image (probably better than it being retyped with a different syntax)

patrickyao