[ML News] GPT-4 solves MIT Exam with 100% ACCURACY | OpenLLaMA 13B released

Показать описание

#gpt4 #mit #ai

A new paper claims to use GPT-4 to solve 100% of a set of MIT university exercises. Some people are skeptic and their investigations reveal more than one problem with this paper...

OUTLINE:
0:00 - ChatGPT gives out Windows 10 keys
0:30 - MIT exam paper
2:50 - Prompt engineering
5:30 - Automatic grading
6:45 - Response by other MIT students
8:30 - Unsolvable questions
10:50 - Duplicates
13:30 - Cascading the heuristics
22:40 - Other problems
29:25 - OpenLLaMA 13B published

References:

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Рекомендации по теме

Комментарии

“First of all that’s pretty cool, second, that’s pretty suspicious” all AI news in a nutshell

Death_by_Tech

I'm glad the MIT paper has fully open-source code.

That's why I refuse to believe (or take with a grain of salt) any strong claims made by papers without providing actual code.

television

Breaking news: 100% of students ACE MIT exam given unlimited tries and access to correct solutions.

lbgstzockt

I had higher expectations for MIT than this.

You were a true academic gentleman in reviewing the paper, giving it all the benefits for the many doubts there were, and even giving advice on how to change wording so that the scope of the paper finds its true worth. The actual wording of the findings is egregiously misleading, I must say. "Advanced engineering X Y and Z were done in order to achieve 100% results", as opposed to the truth: "It was allowed to guess untill it got the correct answers". No exam anywhere lets students guess and redo until they get the exam 100% correct.

ArchonExMachina

One of my favorite posts from Yannic. Just totally in awe of the hard, smart work done by these undergrads. Thanks to them for showing their older peers how things should be done. ML is (probably) broken until this standard of reviewing is applied routinely. Not holding my breath.

alanparker

Thanks for making this. As always, appreciate the detailed analysis. Never thought I would be featured in one of your videos! Just wanted to add that we made two minor corrections since you originally recorded this video. Both are marked with ✏️ in the Notion report.

Specifically:
1. Amending claims of *exact* duplicate questions within the test set (duplicate solutions and information leakage still stand)
2. Correcting the claim that a specific example was identical in semantic content, when it was really the solution that was identical.

Finally, given the data quality I wouldn’t take much stock in our 60% accuracy replication number. This was done mostly to replicate rather than to evaluate GPT4 (hard to do on fundamentally flawed data).

raunakdoesdev

"Failure was never an option" - The original author, probably.

CoughSyrup

My favorite and earliest source of ML news since channel creation.

nocturnomedieval

updated conclusion: "if your system knows the answers, then it can come up with these answers"

JurekOK

Drawing attention to yourself is all you need!

mshonle

Just want to also point out that "there exist prompts that can get the right answers as long as you find them" is not really an interesting alternate conclusion to draw, even if that can be drawn here. If you have to engineer prompts per question in order to get the desired result, it's not practical or applicable to new problems. At that point, you're using the grader and prompt engineer to do the task and not the model. The model, if flexible enough, would just determine the speed at which the result is obtained.

Suppose we have no prompt engineer and are using only a grader to check. You could let a completely random sentence generator do the same process and in the limit say "look, it got it right, it must be so smart".

Also, you could probably find an adversarial prompt to say anything you want no matter the question. That's just another reason for existence to not be interesting.

ryanbaten

These WIndows 10 keys are official and are provided by Microsoft. They're everywhere on the Internet.

Athari-P

😄 8:19 & 22:57 this made me chuckle — researchers and their lack of software engineering best practices… great that this point hits home also for MIT “experts”. Right in line with writing unmaintainable, untested code and leaking credentials left and right

dinoscheidt

Your experienced opinion restores faith in this world! Thank you! pls keep up the amazing work.

youngcampoproductions

- What color is the sky?
- Green!
- No. Try again!
- Blue!
- ... you are 100% correct!

hashishishin

Thanks!! You are the best man. Sending love and appreciation from Tel-Aviv

ishaygreen

This goes to show you how much horse shii is in many academic ''studies'' that get released. Imagine, if this kind of garbage comes out of MIT, how much garbage is coming out overall? People need to start scrutinizing EVERYTHING, EVERYWHERE.

bezillions

Good comment about conference reviewing! (I'm always amazed how the media cares whether a paper is peer reviewed.)

rogerwattenhofer

The prompt to get the Windows 10 Pro keys was one of the most creative prompts I have seen.

kristoferkrus

First vid I’ve seen from you. I like it! Love the critique of research papers, and I want to learn continuously ML topics and theory

rizzy

[ML News] GPT-4 solves MIT Exam with 100% ACCURACY | OpenLLaMA 13B released

[ML News] GPT-4 solves MIT Exam with 100% ACCURACY | OpenLLaMA 13B released

[ML News] GPT-4 Rumors | AI Mind Reading | Neuron Interaction Solved | AI Theorem Proving

GPT-4: The Perfect MIT Mathematics Graduate (pure Logic?)

Elon Musk fires employees in twitter meeting DUB

OpenAI's CEO on What Kids Should Be Studying

Mythbusters Demo GPU versus CPU

Testing the limits of ChatGPT and discovering a dark side

I can't STOP reading these Machine Learning Books!

The HARDEST part about programming 🤦‍♂️ #code #programming #technology #tech #software #developer...

Inside OpenAI, the Architect of ChatGPT, featuring Mira Murati | The Circuit with Emily Chang

Mr. Robot Sucks

Do you ACTUALLY NEED math for Machine Learning?

Day in My Life as a Quantum Computing Engineer!

How Large Language Models Work

'Don't Learn to Code, But Study This Instead...' says NVIDIA CEO Jensen Huang

Best Programming Language For AI in 2024 | Intellipaat #Shorts #AI #Python

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata

With Spatial Intelligence, AI Will Understand the Real World | Fei-Fei Li | TED

The cost of ChatGPT API usage 💰

Prompt Engineering Tutorial – Master ChatGPT and LLM Responses

AI vs Machine Learning

AI will take your job. — Sam Altman

A Day in the Life of Cyber Security | SOC Analyst | Penetration Tester | Cyber Security Training

Testing Stable Diffusion inpainting on video footage #shorts