Can OpenAI Codex Debug Its Own Code?

preview_player
Показать описание
OpenAI Codex is OpenAI's latest GPT-based language based model for GitHub Copilot that generates code. Here we test if Codex can debug code, and even correct it's own errors. We look at both fixing error messages and identifying silent errors. The results were far from perfect but quite intriguing! Check out my channel for more OpenAI Codex content if you find this interesting.

Рекомендации по теме
Комментарии
Автор

It seems like the question-answer prompts with the triple quotes doesn't result in code. GPT models are very sensitive to the structure of the prompt, and they won't generate code in a place that does not look like a place for code. Don't end the prompt with an unclosed quote block.

PremierSullivan
Автор

I just plugged in what you ran and simply commented "Fix the above code" - and it got me a proper loop. Also temperature/randomness might help if it's in a rut, at least it works for me.

I also like to turn it into a dialogue for the sake of it, here is an excerpt:

"Why is the above code incorrect?" - "Because it is not tail recursive."

"Wrong, it is missing a for loop calling the print function. Fix the code using this information"

... and he provided me with the proper code. Ultimately, I think we (at least me) still ascribe a bit too much consciousness to an entity that is great at pulling template code (and doing lots with it to be very sure) but a bit sketchy as far as information retrieval is concerned. Still, if this is formalized a bit and turned into clear prompts, it's gonna be a blessing for all kinds of tasks - and be it only to have it help with quick debugging by conventional means. All in all really useful, this stuff works great so far and it's going to be ridiculous in a short while.

minhuang
Автор

Fib: You are constantly asking for a DIGIT. Digits are 0 - 9. You are actually looking for a NUMBERs of the sequence. You need an additional AI to figure out what the human actually means.
It is an impressive demonstration how the human factor still remains the weak point in this sort of workflow. Garbage in, garbage out.

DerAlbi
Автор

So, to kinda make this better, you want to have a few examples lined up in a neat format like something like

"Question 1: The code for this bubble sorter is not working
(code)
(Error log here)
Question 2: Please rewrite the code so that the bubble sort works
(rewritten code that's been fixed by you)
Question 3: The code for this AI does not work
(code)
(Error log here)
Question 4: Please rewrite the code so that the AI works
(rewritten code by you, again)"
and have like 4 of those question/answer pairs.
Then you should put a legit error you made and let the AI solve it by just entering this after the examples
"Question 69: The code for __ is not working.
(code)
(error log)
Question 70: Please rewrite the code so that the __ works"
and let codex complete it. It should work somewhat better because it already has seen some examples.

SongStudios
Автор

I'm more interested in seeing if we can gaslight it into thinking there are errors in a correct piece of code.

Centauri
Автор

I love it. How interesting the way it produces some really strange behaviour at times at other times it performs so well. Thanks for sharing your videos, I'm finding them really entertaining. One thing I've noticed is that the traceback you are pasting into there has the full path of your files and the line numbers that have absolutely no meaning to codex, I'd imagine that all that extra information would probably confuse it.

DanChristos
Автор

This guy is going to grow his entire channel on Codex.

saar
Автор

as far as i remember, in that linear regression you need a x^T(vector-column) times x(vector-row), which should result in an NxN matrix that has to be inverted as a matrix, rather than x.dot(x) which results in a scalar. obviously it can't invert that scalar as a matrix. i'm not sure about numpy's syntax, but it just needs a tensor product of x^T and x, and it's probably not a dot product.

Alexander_Sannikov
Автор

I can't wait to eventually have access to tools like this.

tentative_flora
Автор

for anyone wanna know what are those chinese at 0:27 they are
# variable
# program=data structure+algorithm
# variable is a value that can be reused, or a code name
# rules for variable naming
# variable names can contain numbers, uppercase and lowercase letters, underlines or more. how ever we do not recommend symbols other than the first three

rocketorbit
Автор

Can it generate assembly code? (eg. Caesar cipher)

JerryThings
Автор

You need to use the second codex engine for concise, deterministic, and countable query responses from the oracle.

karlwhitford
Автор

wow, we are like a step near to somebody saying, create superintelligence :)

SudheendraRao
Автор

can it recognize code? so if you give it the fib example and ask the question "what does this code do?" will it answer "this code calculates the fibonacci numbers and prints out the 20th." what if you write instead of "fib" "xyz", so obfuscate the code a bit? so it cant identify code by variable or routine names. or even throw it off a bit, write "bernoulli" instead of "fib"? because if it can do this, this would mean, it has an internal model that runs the code and interprets the output, and so really understands it. what if you do this with more complicated code? does it at least have a rough idea, what the code is doing?

peterkonrad
Автор

i think you should try increasing the temperature,

toafloast
Автор

7:11 I’m pretty sure this is actually modifying the code - 17 and 6 are still flipped around.

TheLegendaryHacker
Автор

Ironically I think you're trying to talk to it too much like a computer, but you need to talk to it like a human, it was trained on human data and language. I put in much more human like questions and it (almost) always gives me the right answer. only time I find it misses if the codes too long, like I'm trying to make the entire pacman game with one question or something like that. but small loops like that should be able to be answered easily

michaelvaughan
Автор

Every time this AI is not able to perform a certain task I'm relieved that I will keep my job at least a little longer..

TheLeontheking
Автор

1- Are we going to see OpenAI Codex similar open source alternatives?
2- Can AI-Assisted Programming in the future create other programs autonomously? For example automatically and autonomously look into real world challenges /problems and learn from open sources (videos, research papers, codes, libraries...Etc.) and create its own solutions to address diagnosis and recommendations for diseases and medical conditions, economic challanges... Etc..? What do you think about this? And what is the expected time-frame and steps required to reach this?

antiquesordo
Автор

It seems like the answer is yes..according to the news

DistortedV