Is CODE LLAMA Really Better Than GPT4 For Coding?!

Показать описание

Code LLaMA is a fine-tuned version of LLaMA 2 released by Meta that excels at coding responses. Reports say it is equal and sometimes even better than GPT4 at coding! This is incredible news, but is it true? I put it through some real-world tests to find out.

Enjoy :)

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? ✅

Rent a GPU (MassedCompute) 🚀
USE CODE "MatthewBerman" for 50% discount

My Links 🔗

Media/Sponsorship Inquiries 📈

Links:

Рекомендации по теме

Комментарии

What tests should I add to future coding tests for LLMs?

matthew_berman

*Man, you turned my world around*
Thanks for your content!

TheUnderMind

Love your videos. I’ve learned a lot. One thing I would love to see you test these code models against is being able to utilize an API document you provide it along with credentials to be capable of executing an API request to another application. I’ve been trying to do this with a number of models and most fail.

trezero

Yes please, let's see how it's done on a realistic consumer grade GPU. Nothing over 24gb and preferably 12gb. Love your content.

fuba

Incredible, life is getting better and better with all these outputs. I am porting a bunch of old code to Python, then MOJO, to utilize web, mobile, and marketing automation. This is great! When you get time would be great to do this follow-up, I am converting PHP code into Python, and I will be a Patron 100% if you can show this as an example 1. documenting the way to convert and reverse prompt the old code, then proving also proper documentation including API documentation, to have the Code writer LLM output at least to 80-90% so that I will have a engineer finalize it. Thanks, Matthew!!

mercadolibreventas

Great first showing! Will be interesting to see how it ages as people use it for tasks outside of the testing scope.
Nitpick - I think it's probably more fair to compare to code interpreter or the gpt-4 api. Default ChatGPT i suspect has a temperature >= .4

thenoblerot

I think the real utility of a coding assistant is the ability to integrate with your existing projects and assist as you develop them yourself, kind of as a really good autocomplete and pair programmer. None of these tests really demonstrate which is "better" at doing that, though a large context window certainly seems key for something like that.

Aside from that, I have used GPT-4 for from-scratch coding tasks that have been useful.
For example, you could run some of these tests:
- Take a bunch of documents in a folder and perform some kind of repetitive task on them, such as renaming all of them in a specific way based on their contents.
- Go through a bunch of images in a folder and sort them into sub-folders based on their contents (cat pictures, dog pictures, landscapes, etc)
- Generate a YouTube thumbnail for a given video based on a specific spec and maybe some provided template images to go along with it.

Basically, think of one-off or repetitive things someone might want to do but they don't know how to code it, and describe what is needed to the AI and see if it can produce a usable script. Also, a big thing is going back and forth. If the script has an error or doesn't work right away, describe the problem to it (or paste the error, etc) and see if it can correct and adjust the script.

Rangsk

Amazing results! I think an interesting prompt could be to challenge the model to reduce a given piece of code to the fewest characters possible while retaining the original functionality.

And while Im here.. :D I would really love a video diving into the basics of quantization, what the differences between the quantization methods are on a high level and how to find out what model version you should use depending on what GPU(s) you have available. Also how to run the models using python code instead of local "all-in-one" tools so I can use them for my own scripts and large datasets. But also how to set up a local runpod on your own server and what open source front-end tools you have available to securely share the models with users in your network. Keep up the great work!

PotatoKaboom

Hi Matthew, a full tutorial on how to install the full solution 34B with Code LLaMA would be really welcome. Great videos with really useful content, thank you very much for all your efforts to help us catch up on the AI wave.

MrOptima

A video on how to install it would be great. Thank you!

raminmagidi

Writing code is one of the main reasons I subscribe to ChatGPT4 - If Code Llama is as capable at coding as you demonstrated, I could save $20 per month by switching. Thank you for showing me this alternative!

micbab-vgmu

I'm planning on installing lama 2 locally soon. I could watch the old videos, but a new one would be nice. :)

tmhchacham

This is why I subscribed to this channel. Connecting the viewer to the actual project

dtory

That’s impressive. I think you should consider giving the code models incorrect code, and ask models to fix it or find a bug. The challenges could include syntax and logical issues. Such as floating bugs, or incorrect behavior, etc.

korseg

a fun way to test models against each other for video content would be to make up a game where the contestants have to write code to play, like have an arena and virtual bots that you have to write the code for them to race/find/fight/w/e, give both models the same description of the game and then we could watch the dramatic finale as their bots face off

mungojelly

WizardCoder and Phind are also crushing some recent tests

ThisPageIntentionallyLeftBlank

Hey Matthew - would be great for you to do a deep dive in Text Generation UI and how to use the whole thing.. Also, cover GGUF and GPTQ (other formats too) would be helpful...

sundruid

Any chance you can do a video on local install+ vscode integration options?
Ideally looking for a copilot alternative that can be fine-tuned against an actual local codebase

unom

about the [1, 1, 1] all equal - i don't agree that gpt4 got it wrong. the expected result of the [] case was not specified in the description. the test itself is wrong for magically expecting true. also, the context window of codellama is a big "nope" for me. i often tell gpt4 "yes but do x differently". that requires more tokens

HoDx

Hi Matthew, amazing video! Thanks!
Could you tell me what is your Graphic card ?

steveheggen-aquarelle

Is CODE LLAMA Really Better Than GPT4 For Coding?!

Is CODE LLAMA Really Better Than GPT4 For Coding?!

META's New Code LLaMA 70b BEATS GPT4 At Coding (Open Source)

FINALLY! Open-Source 'LLaMA Code' Coding Assistant (Tutorial)

CODE-LLAMA: Is it Actually Good?

Zuck's new Llama is a beast

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT

Llama 3.1 is ACTUALLY really good! (and open source)

META's new OPEN SOURCE Coding AI beats out GPT-4 | Code Llama 70B

Ollama vs Tabnine Pro Which is Best Alternative to Github Copilot in VS Code

Using Llama Coder As Your AI Assistant

How Did Llama-3 Beat Models x200 Its Size?

Is Phind AI's Code LLama FineTune BETTER Than GPT 4 Code Interpreter?!

Why Llama 2 Is Better Than ChatGPT (Mostly...)

New AI - Code Llama - Broke the Internet: Why Everyone's Switching from GPT-4

LLaMA 3 “Hyper Speed” is INSANE! (Best Version Yet)

37% Better Output with 15 Lines of Code - Llama 3 8B (Ollama) & 70B (Groq)

Code Llama 70B Setup & Review: A New Era in AI Coding

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

LLaMA 405b Fully Tested - Open-Source WINS!

Build Anything with Llama 3 Agents, Here’s How

AI Coding Showdown: Llama 3 vs ChatGPT vs Gemini!

Llama 3.1 better than GPT4 ?? OpenAI vs Meta with Llama 3.1 405B model

Boost Productivity with FREE AI in VSCode (Llama 3 Copilot)

Testing Llama 3: Evaluating Performance With Coding and Reasoning! Better Than GPT-4?