[ML News] GPT-4 solves MIT Exam with 100% ACCURACY | OpenLLaMA 13B released

preview_player
Показать описание
#gpt4 #mit #ai

A new paper claims to use GPT-4 to solve 100% of a set of MIT university exercises. Some people are skeptic and their investigations reveal more than one problem with this paper...

OUTLINE:
0:00 - ChatGPT gives out Windows 10 keys
0:30 - MIT exam paper
2:50 - Prompt engineering
5:30 - Automatic grading
6:45 - Response by other MIT students
8:30 - Unsolvable questions
10:50 - Duplicates
13:30 - Cascading the heuristics
22:40 - Other problems
29:25 - OpenLLaMA 13B published

References:

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

“First of all that’s pretty cool, second, that’s pretty suspicious” all AI news in a nutshell

Death_by_Tech
Автор

I'm glad the MIT paper has fully open-source code.

That's why I refuse to believe (or take with a grain of salt) any strong claims made by papers without providing actual code.

television
Автор

Breaking news: 100% of students ACE MIT exam given unlimited tries and access to correct solutions.

lbgstzockt
Автор

I had higher expectations for MIT than this.

You were a true academic gentleman in reviewing the paper, giving it all the benefits for the many doubts there were, and even giving advice on how to change wording so that the scope of the paper finds its true worth. The actual wording of the findings is egregiously misleading, I must say. "Advanced engineering X Y and Z were done in order to achieve 100% results", as opposed to the truth: "It was allowed to guess untill it got the correct answers". No exam anywhere lets students guess and redo until they get the exam 100% correct.

ArchonExMachina
Автор

One of my favorite posts from Yannic. Just totally in awe of the hard, smart work done by these undergrads. Thanks to them for showing their older peers how things should be done. ML is (probably) broken until this standard of reviewing is applied routinely. Not holding my breath.

alanparker
Автор

Thanks for making this. As always, appreciate the detailed analysis. Never thought I would be featured in one of your videos! Just wanted to add that we made two minor corrections since you originally recorded this video. Both are marked with ✏️ in the Notion report.

Specifically:
1. Amending claims of *exact* duplicate questions within the test set (duplicate solutions and information leakage still stand)
2. Correcting the claim that a specific example was identical in semantic content, when it was really the solution that was identical.

Finally, given the data quality I wouldn’t take much stock in our 60% accuracy replication number. This was done mostly to replicate rather than to evaluate GPT4 (hard to do on fundamentally flawed data).

raunakdoesdev
Автор

"Failure was never an option" - The original author, probably.

CoughSyrup
Автор

My favorite and earliest source of ML news since channel creation.

nocturnomedieval
Автор

updated conclusion: "if your system knows the answers, then it can come up with these answers"

JurekOK
Автор

Drawing attention to yourself is all you need!

mshonle
Автор

Just want to also point out that "there exist prompts that can get the right answers as long as you find them" is not really an interesting alternate conclusion to draw, even if that can be drawn here. If you have to engineer prompts per question in order to get the desired result, it's not practical or applicable to new problems. At that point, you're using the grader and prompt engineer to do the task and not the model. The model, if flexible enough, would just determine the speed at which the result is obtained.

Suppose we have no prompt engineer and are using only a grader to check. You could let a completely random sentence generator do the same process and in the limit say "look, it got it right, it must be so smart".

Also, you could probably find an adversarial prompt to say anything you want no matter the question. That's just another reason for existence to not be interesting.

ryanbaten
Автор

These WIndows 10 keys are official and are provided by Microsoft. They're everywhere on the Internet.

Athari-P
Автор

😄 8:19 & 22:57 this made me chuckle — researchers and their lack of software engineering best practices… great that this point hits home also for MIT “experts”. Right in line with writing unmaintainable, untested code and leaking credentials left and right

dinoscheidt
Автор

Your experienced opinion restores faith in this world! Thank you! pls keep up the amazing work.

youngcampoproductions
Автор

- What color is the sky?
- Green!
- No. Try again!
- Blue!
- ... you are 100% correct!

hashishishin
Автор

Thanks!! You are the best man. Sending love and appreciation from Tel-Aviv

ishaygreen
Автор

This goes to show you how much horse shii is in many academic ''studies'' that get released. Imagine, if this kind of garbage comes out of MIT, how much garbage is coming out overall? People need to start scrutinizing EVERYTHING, EVERYWHERE.

bezillions
Автор

Good comment about conference reviewing! (I'm always amazed how the media cares whether a paper is peer reviewed.)

rogerwattenhofer
Автор

The prompt to get the Windows 10 Pro keys was one of the most creative prompts I have seen.

kristoferkrus
Автор

First vid I’ve seen from you. I like it! Love the critique of research papers, and I want to learn continuously ML topics and theory

rizzy