Can OpenAI Codex Debug Its Own Code?

Показать описание

OpenAI Codex is OpenAI's latest GPT-based language based model for GitHub Copilot that generates code. Here we test if Codex can debug code, and even correct it's own errors. We look at both fixing error messages and identifying silent errors. The results were far from perfect but quite intriguing! Check out my channel for more OpenAI Codex content if you find this interesting.

Рекомендации по теме

Комментарии

It seems like the question-answer prompts with the triple quotes doesn't result in code. GPT models are very sensitive to the structure of the prompt, and they won't generate code in a place that does not look like a place for code. Don't end the prompt with an unclosed quote block.

PremierSullivan

I just plugged in what you ran and simply commented "Fix the above code" - and it got me a proper loop. Also temperature/randomness might help if it's in a rut, at least it works for me.

I also like to turn it into a dialogue for the sake of it, here is an excerpt:

"Why is the above code incorrect?" - "Because it is not tail recursive."

"Wrong, it is missing a for loop calling the print function. Fix the code using this information"

... and he provided me with the proper code. Ultimately, I think we (at least me) still ascribe a bit too much consciousness to an entity that is great at pulling template code (and doing lots with it to be very sure) but a bit sketchy as far as information retrieval is concerned. Still, if this is formalized a bit and turned into clear prompts, it's gonna be a blessing for all kinds of tasks - and be it only to have it help with quick debugging by conventional means. All in all really useful, this stuff works great so far and it's going to be ridiculous in a short while.

minhuang

Fib: You are constantly asking for a DIGIT. Digits are 0 - 9. You are actually looking for a NUMBERs of the sequence. You need an additional AI to figure out what the human actually means.
It is an impressive demonstration how the human factor still remains the weak point in this sort of workflow. Garbage in, garbage out.

DerAlbi

So, to kinda make this better, you want to have a few examples lined up in a neat format like something like

"Question 1: The code for this bubble sorter is not working
(code)
(Error log here)
Question 2: Please rewrite the code so that the bubble sort works
(rewritten code that's been fixed by you)
Question 3: The code for this AI does not work
(code)
(Error log here)
Question 4: Please rewrite the code so that the AI works
(rewritten code by you, again)"
and have like 4 of those question/answer pairs.
Then you should put a legit error you made and let the AI solve it by just entering this after the examples
"Question 69: The code for __ is not working.
(code)
(error log)
Question 70: Please rewrite the code so that the __ works"
and let codex complete it. It should work somewhat better because it already has seen some examples.

SongStudios

I'm more interested in seeing if we can gaslight it into thinking there are errors in a correct piece of code.

Centauri

I love it. How interesting the way it produces some really strange behaviour at times at other times it performs so well. Thanks for sharing your videos, I'm finding them really entertaining. One thing I've noticed is that the traceback you are pasting into there has the full path of your files and the line numbers that have absolutely no meaning to codex, I'd imagine that all that extra information would probably confuse it.

DanChristos

This guy is going to grow his entire channel on Codex.

saar

as far as i remember, in that linear regression you need a x^T(vector-column) times x(vector-row), which should result in an NxN matrix that has to be inverted as a matrix, rather than x.dot(x) which results in a scalar. obviously it can't invert that scalar as a matrix. i'm not sure about numpy's syntax, but it just needs a tensor product of x^T and x, and it's probably not a dot product.

Alexander_Sannikov

I can't wait to eventually have access to tools like this.

tentative_flora

for anyone wanna know what are those chinese at 0:27 they are
# variable
# program=data structure+algorithm
# variable is a value that can be reused, or a code name
# rules for variable naming
# variable names can contain numbers, uppercase and lowercase letters, underlines or more. how ever we do not recommend symbols other than the first three

rocketorbit

Can it generate assembly code? (eg. Caesar cipher)

JerryThings

You need to use the second codex engine for concise, deterministic, and countable query responses from the oracle.

karlwhitford

wow, we are like a step near to somebody saying, create superintelligence :)

SudheendraRao

can it recognize code? so if you give it the fib example and ask the question "what does this code do?" will it answer "this code calculates the fibonacci numbers and prints out the 20th." what if you write instead of "fib" "xyz", so obfuscate the code a bit? so it cant identify code by variable or routine names. or even throw it off a bit, write "bernoulli" instead of "fib"? because if it can do this, this would mean, it has an internal model that runs the code and interprets the output, and so really understands it. what if you do this with more complicated code? does it at least have a rough idea, what the code is doing?

peterkonrad

i think you should try increasing the temperature,

toafloast

7:11 I’m pretty sure this is actually modifying the code - 17 and 6 are still flipped around.

TheLegendaryHacker

Ironically I think you're trying to talk to it too much like a computer, but you need to talk to it like a human, it was trained on human data and language. I put in much more human like questions and it (almost) always gives me the right answer. only time I find it misses if the codes too long, like I'm trying to make the entire pacman game with one question or something like that. but small loops like that should be able to be answered easily

michaelvaughan

Every time this AI is not able to perform a certain task I'm relieved that I will keep my job at least a little longer..

TheLeontheking

1- Are we going to see OpenAI Codex similar open source alternatives?
2- Can AI-Assisted Programming in the future create other programs autonomously? For example automatically and autonomously look into real world challenges /problems and learn from open sources (videos, research papers, codes, libraries...Etc.) and create its own solutions to address diagnosis and recommendations for diseases and medical conditions, economic challanges... Etc..? What do you think about this? And what is the expected time-frame and steps required to reach this?

antiquesordo

It seems like the answer is yes..according to the news

DistortedV

Can OpenAI Codex Debug Its Own Code?

Can OpenAI Codex Debug Its Own Code?

Debug and Fix Broken Code with AI - OpenAI Codex vs. AI21 J1-Jumbo

Can OpenAI Codex Recreate Itself?

This AI is a Better Programmer Than Me - OpenAI Codex Demo

Can OpenAI Codex Compete in an ML Competition?

The Do's and Don'ts of OpenAI Codex

Using OpenAI's Codex to Explain How A Python Function Works

OpenAI Codex Destroying your Coding Career? Not so Fast...

Openai Codex Writes Simple Python - With Some Help ;)

OpenAI Codex | PLG123 Episode 63

Can OpenAI Codex Create AI?

The END for developers? First look at OpenAI Codex + Python Tutorial

Tell Me What You Want to See OpenAI Codex Do

Watch Codex From OpenAI Solve a 'Hard' Coding Challenge from Hacker Rank

OpenAI Codex || Программисты Скоро Будут не Нужны?

Is Openai Codex Smarter Than A Data Scientist?!?

Using Codex from OpenAI To Translate From Python to Javascript Automatically

OpenAI Codex Demo🤯 | Open AI Codex is Crazier Than GitHub Copilot

OpenAI's Codex just did something that will change the world

OpenAI Codex Demo

What OpenAI Codex can Do - A few Examples and How to Use

Magic of Open AI Codex! ✨

Personalize do-that-demo with OpenAI's Codex

13 sick OpenAI Codex use cases I’ve seen so far ❤️‍🔥