Meta drops new LLM based testing

preview_player
Показать описание
Recorded live on twitch, GET IN

Become a backend engineer. Its my favorite site

This is also the best way to support me is to support yourself becoming a better backend engineer.

MY MAIN YT CHANNEL: Has well edited engineering videos

Discord

Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))
Рекомендации по теме
Комментарии
Автор

Now i have to debug the ai, to debug the test, to debug the code....

xxapoloxx
Автор

It looks like the main take away in the paper is they have a loop over the LLM that allows the LLM to rewrite the test until it passed... I wonder if this does the same thing as many devs where it tries maybe two times then just either deletes the test or replaces the body with assert true.

mwwhited
Автор

One of the first things I saw on unit-testing was Mark Seemann's course on Pluralsight. The first questions he asks of himself there: "what should we test?" and "how do we trust tests?". He argued that anything that has an if-statement is a contender to be tested and is not trustworthy. Thus, tests shouldn't contain any branching paths. After like 7 years this approach is yet to fail me.

vsolyomi
Автор

Tests are the specification of your program, if AI could decipher specifications, it could create the program itself. They are going about this backwards; let devs create the tests, and let AI fill in the implementation.

defeqel
Автор

Tests are meant to be used in conjuction with SRP (and TDD). High coverage alone means nothing. I can create 100% coverage with unit tests that don't test anything.
When you use tdd, and can be reasonably sure of your tests validity, then you have a safety net for refactoring. If you add new features or modify existing code, you know immediately if you broke something that was working before.

UNgineering
Автор

Now AI written code can be tested by AI itself.

avaterclasher
Автор

From the moment I understood the weakness of my flesh, it disgusted me.

SMorales
Автор

17:51 We emulated Humans too well it even argues with you about doing work.

Fulminin
Автор

You are right, LLMs only think inside the box, but their box is many times larger than your box.

evancombs
Автор

You have to wonder what was the 1300-lines-function doing that nobody noticed should be tested until this one test covered it.

Titousensei
Автор

I think it is possible to improve on their AI test generator, using the solution you mentioned, suppose we let a LLM instance to be an attacker and then have another as a defender, now we have an adversarial model that will attempt to break and solve the problem at once, and it is possible to create useful test, and not just that it is also possible to apply the same method to other tasks for example hacking

lucasteo
Автор

They should stick in mutation testing into it, to ensure that code coverage is not lying....but it will take forever to test :D

TymurDaudov_aka_tymfear
Автор

Have been using GPT-4 to write tests for the last few months. It's pretty good; it always needs tweaking, but it saves writing them all from scratch, and it often finds the odd case I'd have missed.

seancooper
Автор

This approach works very well as an alternative to RLHF - the computer can generate many tests and immediately check if it succeeds/compiles. It will greatly enhance the LLM's understanding of the code in the future and also their ability to produce quality tests on first run

mzrts
Автор

29:00, I really like "property-based testing" (see hypothesis in Python). You can basically write a test that does nothing or asserts that no error was raised, and then create a strategy to generate random data to pump into your function. Works very well for finding edge case errors

dougmercer
Автор

I've had both experiences an parameterized tests. May take away is this: When the parameters truly don't do anything to change the logic of your test, then it's fine. If the logic and structure has to change to accommodate the parameters it becomes impossible to debug.

Basically when the code really is, "Just pass this different parameter to the API" and nothing else changes it doesn't make it hard to debug. I've seen it go the other way where you parameterize by a generator function that has inputs and outputs that you're basically replacing macro functions and bringing all of the hell that comes with them.

AndrewSayman
Автор

>>> line coverage != good coverage <<<

In one branch you might have many cases, for example:

if (number) { do something }

In this branch you can verify many things, max, min, decimals and etc, and you are still covering a single branch.
If I cover that with only a simple number 1, it will report "it's covered" but it's barely tested.

So, again, line covered != good coverage.

SpacialCow
Автор

I'm 10 min in and it sounds like they're throwing shit at the wall and seeing what sticks.

HyperionStudiosDE
Автор

Their solution: if test say AI code is bad, replace test with AI until AI test says AI code is good.

disruptive_innovator
Автор

I understand where he is going for wherein conditional makes test more complicated. But that's not what's happening here. He's getting too distracted by the syntax that he fails to see the pattern. Test by example is the perfect way to test functions. When you remove indirect inputs and indirect outputs (which is common on OOP and a no-no in functional), then the natural progression for cleaning your functional tests would be to do test by example. Either that or you copy-paste a lot of stuff

FranzAllanSee