Meta drops new LLM based testing

Показать описание

Recorded live on twitch, GET IN

Become a backend engineer. Its my favorite site

This is also the best way to support me is to support yourself becoming a better backend engineer.

MY MAIN YT CHANNEL: Has well edited engineering videos

Discord

Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))

Рекомендации по теме

Комментарии

Now i have to debug the ai, to debug the test, to debug the code....

xxapoloxx

It looks like the main take away in the paper is they have a loop over the LLM that allows the LLM to rewrite the test until it passed... I wonder if this does the same thing as many devs where it tries maybe two times then just either deletes the test or replaces the body with assert true.

mwwhited

One of the first things I saw on unit-testing was Mark Seemann's course on Pluralsight. The first questions he asks of himself there: "what should we test?" and "how do we trust tests?". He argued that anything that has an if-statement is a contender to be tested and is not trustworthy. Thus, tests shouldn't contain any branching paths. After like 7 years this approach is yet to fail me.

vsolyomi

Tests are the specification of your program, if AI could decipher specifications, it could create the program itself. They are going about this backwards; let devs create the tests, and let AI fill in the implementation.

defeqel

Tests are meant to be used in conjuction with SRP (and TDD). High coverage alone means nothing. I can create 100% coverage with unit tests that don't test anything.
When you use tdd, and can be reasonably sure of your tests validity, then you have a safety net for refactoring. If you add new features or modify existing code, you know immediately if you broke something that was working before.

UNgineering

Now AI written code can be tested by AI itself.

avaterclasher

From the moment I understood the weakness of my flesh, it disgusted me.

SMorales

17:51 We emulated Humans too well it even argues with you about doing work.

Fulminin

You are right, LLMs only think inside the box, but their box is many times larger than your box.

evancombs

You have to wonder what was the 1300-lines-function doing that nobody noticed should be tested until this one test covered it.

Titousensei

I think it is possible to improve on their AI test generator, using the solution you mentioned, suppose we let a LLM instance to be an attacker and then have another as a defender, now we have an adversarial model that will attempt to break and solve the problem at once, and it is possible to create useful test, and not just that it is also possible to apply the same method to other tasks for example hacking

lucasteo

They should stick in mutation testing into it, to ensure that code coverage is not lying....but it will take forever to test :D

TymurDaudov_aka_tymfear

Have been using GPT-4 to write tests for the last few months. It's pretty good; it always needs tweaking, but it saves writing them all from scratch, and it often finds the odd case I'd have missed.

seancooper

This approach works very well as an alternative to RLHF - the computer can generate many tests and immediately check if it succeeds/compiles. It will greatly enhance the LLM's understanding of the code in the future and also their ability to produce quality tests on first run

mzrts

29:00, I really like "property-based testing" (see hypothesis in Python). You can basically write a test that does nothing or asserts that no error was raised, and then create a strategy to generate random data to pump into your function. Works very well for finding edge case errors

dougmercer

I've had both experiences an parameterized tests. May take away is this: When the parameters truly don't do anything to change the logic of your test, then it's fine. If the logic and structure has to change to accommodate the parameters it becomes impossible to debug.

Basically when the code really is, "Just pass this different parameter to the API" and nothing else changes it doesn't make it hard to debug. I've seen it go the other way where you parameterize by a generator function that has inputs and outputs that you're basically replacing macro functions and bringing all of the hell that comes with them.

AndrewSayman

>>> line coverage != good coverage <<<

In one branch you might have many cases, for example:

if (number) { do something }

In this branch you can verify many things, max, min, decimals and etc, and you are still covering a single branch.
If I cover that with only a simple number 1, it will report "it's covered" but it's barely tested.

So, again, line covered != good coverage.

SpacialCow

I'm 10 min in and it sounds like they're throwing shit at the wall and seeing what sticks.

HyperionStudiosDE

Their solution: if test say AI code is bad, replace test with AI until AI test says AI code is good.

disruptive_innovator

I understand where he is going for wherein conditional makes test more complicated. But that's not what's happening here. He's getting too distracted by the syntax that he fails to see the pattern. Test by example is the perfect way to test functions. When you remove indirect inputs and indirect outputs (which is common on OOP and a no-no in functional), then the natural progression for cleaning your functional tests would be to do test by example. Either that or you copy-paste a lot of stuff

FranzAllanSee

Meta drops new LLM based testing

Meta drops new LLM based testing

META drops another Open Source Model! SAM 2 is leagues above...

Meta AI LM-Infinite - Massive LLM improvement!

Meta's open-source AI matches the top models

Meta's New Insane AI 'Tool LLM' Shocks The Entire Industry (Outperforms Gorilla 10X)

How Large Language Models Work

Code Llama 🦙 New AI LLM Released by Meta 🤯 #ai #chatgpt #gpt4 #gpt #codellama #llama2 #llm

Meta LLM Compiler - A Unique Model for Code Optimization at Low Level

This new AI is powerful and uncensored… Let’s run it

META's New Code LLaMA 70b BEATS GPT4 At Coding (Open Source)

Meta AI announces new, more efficient, LLM | MobileLLM

Meta Code LLama 70B and it's Consequences for Code Generation Apps

Meta Rewarding - LLM as a Meta Judge

End To End LLM Project Using LLAMA 2- Open Source LLM Model From Meta

Galactica: New LLM by Meta hallucinates Science - First Look

Install Meta LLM Compiler Locally and Optimize Any Code

Databricks Drops Open Source LLM DBRX, Beats Meta, xAI, Mixtral, GPT 3.5

Llama 3.1 | Meta is leading Open Source AI

NEVER buy from the Dark Web.. #shorts

what it’s like to work at GOOGLE…

MetaGPT: Redefining Multi-Agent Collaboration for Complex Tasks

Llama 3: Meta's New LLM Model and Its Responsible Development Approach

Ashneer views on Ai & jobs (shocking😱)

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT