AI Coding Crap: More Examples. Claude 3.5 Sonnet Demo & more - with @Proof_news

preview_player
Показать описание
AI Generated code is bad. But the demos do their best to hide that fact. Even Claude 3.5 Sonnet, which the company says "raises the industry bar for intelligence" But in their demo video, they gave Claude a REALLY easy problem to fix, it did a lousy job, and their video glossed over the bad parts.

But, it's not just Claude. In conjunction with @Proof_news and a new AI tool they've built for creators, I've broken down some different problems that human Software Developers can do, but - despite being trained on all the information and having near total recall - AI's just cannot get right.

Video Interview with Proof News about this investigation:

00:00 AI Code is still not good
00:45 Work with @Proof_news
01:39 Intro to the Claude 3.5 Sonnet Demo video
02:26 Transferring code from video to an editor so I can demo it
02:50 What Claude thinks the bug is - and why it's wrong
03:45 The 3-character fix Claude missed
04:03 But Claude's test code is even worse
04:07 Hey, Claude - that's a square, not a circle
04:34 Don't test by starting with the answer
05:24 Bad demo practices contribute to the hype cycle
05:33 Collaborating with @Proof_news
06:06 Generalizing from "Claude got this wrong" to "AI's in general get this wrong"
06:58 Other kinds of Software problems AI does a poor job of
07:51 A classic problem, with a modern solution the AI's can't code
09:47 When AI's replace human programmers, this is what is lost
10:17 Planning and Estimation - or the lack thereof
11:16 AI coders fail at even small stuff, but the AI companies talk big
11:58 Companies are probably going to get better at not getting caught in poor demos
12:13 More videos about things AI's can't seem to get right
12:27 Wrap up

Links:
# Video Interview with Proof News about this investigation:

# Claude 3.5 Sonnet Coding Demo Video

# Source of "raises the industry bar for intelligence" quote

# Proof News Resources
Рекомендации по теме
Комментарии
Автор

Hot take: Code-writing AI is hyped up so much because companies needed an excuse to lay off many people, and then hire them again later as independent contractors with far less pay and benefits...

longlostwraith
Автор

Replacing coders with AI is like replacing doctors with AI. Who really needs replaced with AI is upper management. Heard thinking, commonality, same old mistakes sounds like a perfect use case for AI.

leversofpower
Автор

I use claude and AI very effectively. Like right now I am writing a Kotlin app, but I only know javascript. But you have to know what you are doing to get real functionality out of it. I read every line of code and I am starting to really get Kotlin. But it's all about the pseudocode. You have to think through what you want and communicate it very clearly and expect to have to hone your communication several times. All of that makes me a better developer. No AI cannot replace me, but it sure does help me.

patrickjreid
Автор

AI is pretty good if you are using it to "query" stuff that is in documentation. in fact i often drop a link to node docs for an example then ask a questing around node that i know will be really well documented.

evrybodygets
Автор

My man, never stop what you do here. Thanks for another great video calling out the nonsense.

breddygud
Автор

I have been experimenting with the aider project recently. It has a few benefits for me, but big issues too. The biggest benefit is that it changes the dynamic of programming from pure creation to creation&collaboration. It creates mediocre code pretty often (backed w/ Sonnet35 or gpt4o), but it gets close enough that I change in the editor and move on.

I'm prompting it extremely granularly too. 'add a new arg here', 'write a test for this one method', etc. If you try to do too much at once, everything falls apart so quickly.

Also I'm a senior dev who can nearly instantly read a chunk of code (in my app at least) and see things wrong. I can't imagine a junior or even most midlevel devs doing anything useful with this without hitting so many speedbumps.

chrisschneider
Автор

been using Claude everyday for a side project. It's not the greatest at debugging issues. You usually have to hold its hand and tell them the exact few lines or functions to focus on otherwise it will overcomplicate things by default. What it's great for is generating some boilerplate or template for a new function or component. It saves tons of time from looking up documentation, googling issues on stackoverflow. You just have to read over the code it generates to see if it makes sense. If not suggest a different approach and it's generally pretty good at adapting since apparently you're never wrong...

Kevtf
Автор

Please never stop. This is such an uphill battle on something where it is so obvious. How can the sane people be the minority? Artificial intelligence should be the term used to describe the people who think the LLMs work well.

InfiniteQuest
Автор

Appreciate the injection of sanity. We shouldn't accept code from LLMs that we'd fail in review if it was from a human.

Patashu
Автор

I think the biggest problem is the marketing angle where AI companies are saying that it will replace developers instead of saying that this is something that will make developers 2 times faster/eficcient for ex, thus reducing your development cost.
In it's current state Claude AI is good enough to convince me to buy a subscription. Problem is that 20$/mo is not nearly enough to finance AI R&D, thus companies choose to overhype the AI so they can fool VCs into funding them with insane amounts of money.

guramguram
Автор

Takeaways: 1. The code gens can at most do imperfect boilerplate stuff. 2. Coder monkeys are f***ed, and 3. All others will have more work proofreading and testing AI code. Fair assumptions?

mimisbrunnur
Автор

Models won't get a lot better, coding is just the wrong category of problems for Neural Networks, it's not hard to understand, programming is not a fuzzy problem, which is what they actually do well...

Not_Even_Wrong
Автор

So... there is hope that I won't lose my job tomorrow?

sandrorass
Автор

The worst part about A.I that I've found at the moment, is that the API documentation that I usually rely on to get my day-to-day job done has either been purged or has been put behind an auth wall \ is not getting updated, in favour of "use our new A.I tool". - and the new AI tool is producing mediocre code at best or the purpose of me referring to the API documentation is to understand what a certain function, constant value or module does, and just getting an example snippet from a command line doesn't help with that, so I'm finding that I have to dig through definitions in the source code and take my best guess.

Another part is what I would once refer to as an algorithm or set of algorithms within a given package or program are now being blanketed as "A.I."

DeviousMalcontent
Автор

I asked Claude to create a Symfony (PHP) command to do simple checks that language binds of some forms existed in their corresponding language files. It only imported 3 classes that it never used, 5 variables that it never used, called 2 methods that do not exist, and used 1 method that was deprecated in 2018. It did not perform the check in a workable way and reported everything as an error. I've never wanted a junior developer who can't learn, but that's apparently what our team just got.

garettrobson
Автор

I dont understand why people are trying to crowbar LLMs into reasoning tasks. If they took the time to understand how they work, they’d stop wasting their time on this, and only use LLMs for things they are good at, which language stuff, not reasoning!!

tollington
Автор

It's a fancy google search, it doesn't think or reason. What is the next most likely weighted token is the only thing that matters, and I saw a video where it very confidently said a Prime number was not prime, because the first token it matched on was "No" so the rest had to make sense. And it was confidently incorrect in it's reasoning (or the output was convincing to a human, no reasoning happened).

Probably due to the fact, like you said, it's going on the most common response. And that could be a reddit thread with completely wrong individuals.

andrewpatterson
Автор

Didn’t we go through the same hype cycle with self driving cars?
The usury based financial system gives all the incentives for creating bubbles before they burst and the big companies are saved with taxation or inflation.

withink
Автор

A project I'm working on started using AI code reviews & it only caught 3 issues, and 2 of them were for completely wrong reasons. All 3 of them were incredibly basic errors that I would have caught without it just by testing it once I was done setting up all the scaffolding.

The rest was dozens upon dozens of misunderstandings of the code base or regurgitations of suggestions that read like they were from a cargo cult programming blog. It took me 30 minutes to sift through utter crap to get very little benefit out of it, and it was only that short for the amount of reviews it gave because they were so crap I could dismiss them out of hand.

It's not even like my code is perfect either. My previous PRs on that same project have gotten plenty of meaningful reviews and adjustment requests from other human reviewers in the past, but the AI didn't catch anything of that nature.

mettaursp
Автор

This really needs a higher level of promotion. Well spotted and this is how code quality is going to go down hill...

techsuvara