Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

Показать описание

In this video, I'll be fully testing the Llama-3.1 405B, 70B & 8B Models to check if it's really good. I'll also be trying to find out if it can really beat Claude 3.5 Sonnet, GPT-4O, DeepSeek & Qwen-2. This model is fully opensource and can be used locally for FREE. It is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔔 Llama-3.1 Launch: Discover Meta's latest Llama-3.1 models, including 8B, 70B, and 405B variants, setting new benchmarks in AI.

📊 Model Comparison: See how the 8B and 70B models stack up against the 405B, Claude 3.5, Sonnet, and GPT-4O, showcasing their impressive performance.

💰 Pricing Update: Get the latest pricing for the Llama models, with Fireworks offering the best deals on 8B, 70B, and 405B, making advanced AI more affordable.

🆓 Platform Availability: Learn where to access these models, from Meta's platform to Huggingface and Ollama, and why Nvidia NIMS is the go-to for free, easy access.

🤖 Tough Questions: Watch as the Llama models tackle 12 challenging questions, from general knowledge to coding tasks, highlighting their capabilities and limitations.

📈 Performance Insights: Understand why the 8B and 405B models shine in terms of size and quality, while the 70B model struggles, with comparisons to Mistral Nemo 12B.

------
Timestamps:

00:00 - Introduction
00:08 - About Llama-3.1 Models
02:10 - Testing
10:02 - Conclusion
11:22 - Ending

Рекомендации по теме

Комментарии

if you change the system prompt and force 8b model to analyse and calculate first then answer the users it will answer most of the questions correctly. i tested in Llama 3.1 8B q8_0. this is the example of system prompt i used: "You are a helpful, smart, kind, and efficient AI assistant. You should always calculate and analyse the question first then fulfill the user's requests to the best of your ability." it will pass 8 questions like 70b

upper-moon-alpha

The 405B game of life might actually be correct... Assuming that it wraps at the edges, the checkerboard starting state will cause everything to die even if the rules are correctly applied.

ij

Hey, the SVGs from the 70B and 405B look a bit like butterflies - the result should be PASSED.

manuelgrama

I spent several hours yesterday working with the 8B FP16 model. The project was coding for an ESP32 using a MEMS I2S microphone, I was using Espressif in VSCode. I would say it's about as good as sonnet in this situation. Inference was fast, answers were well formed.

LlamaKing

jackflash

I run 8b version locally, (not even Q8!). And it answered 5th question about apples correctly! 7th question also (almost) correct, last part of the answer:Therefore, the long diagonal of the regular hexagon is approximately 73.10(no idea how it answers this???because other big models don't, not even 4o???). 8th question kinda worked...Just looked more like confetti rain, not explosion. So, must be something wrong with your settings, like temperature or promt template. Using temp 0. My model only failed questions 1 and 10. Oh, and 6 too.

mlsterlous

Nahh bro. The 70b and the 405b models passed the butterfly test

simeonnnnn

Video Claude 3.5 sonnet Vs Llama3.1 405B

aryindra

it would be useful a video where you show how you used the Nvidia platform for free to use these models

fra

Going to be interesting to see how much better these models get now the community has access to them and going on older models in the pass, some fine-tuning can get some real improvements out of these, especially the 8B and 70B model as they are likely going to be worked on a lot more, being that more of us can actually run them compared to the 405B one which will likely get far less work done on fine-tuning it as hardly any of us can run it locally.

It's a shame Meta didn't release a 13B model and a mixtral like model as I suspect the quality could be a lot better for the smaller size models if done as a mixer of models.

pauluk

AICodeKing, OpenAI, ChatGPT 4o got 10/12 correct, for 83.33%. Question, what is a good model on Ollama to use to aid in first drafts for novels, novella, etc.?

robwin

If you haven't tried the same prompt multiple time for a model, you may want to try again for the same prompts with a model as they are stochastic even when given an exact same prompt, with same seed values. Given it's stochastic, you may have tapped into it's failure side with that single run.

taheralipatrawala

My 8b answerd the Apple question correct as well as the Sister one

drkvaladao

Hey, could you make a comparison video about GPU servers + installations of some AI models and also how to access them from anywhere? Thanks!

unisol

I wonder if they're trying to compete with the mini 4o on this one. For example, if you use the 8b model to talk to itself, will it get better results? It still seems computationally expensive for that, though. It's still cool to have something like this that we can fine tune. Thank you, Llama King

vauths

Has, and always been a fan of Llama and Llama King

neo

Llama King. I appreciate your content as always. However, you should seriously retry gemma 2 27B since the updated fix has been issued. I ran it in LM Studio with your questionnaire and the model only missed 2 of your questions. Honestly it's worth a revisit instead of bashing it before actually running the model correctly.

joshbane

mistal 7b v0.3 and llama 3.1 8B which one is better?

mash-room

the 405b is great but how to use it in the real world ? we can not download it. we would need to access it through whoever provides it. I am looking for the API solutions.

ganian

Ollama updated llama3.1 during my tests, but after updating and re-pulling everything, I repeated my tests:

70.000 words, a summary in about 3000 words, 8B writes jibberish, completely out of bounds. VRAM usage: 107GB + 17 GB in cache, while a total of about 170GB useable VRAM of an 192GB unified system.

Weird. The llama3-gradient line worked better. It „just does not stop writing after the end of the original chapters and hallucinates further chapters till the end of the token window.

LlamaKing? 🎉

MeinDeutschkurs

I tested LLama-3.1-8B and it is terrible at coding tasks compared to Deepseek-coder-v2 16B

ottawadigs

Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

LLaMA 405b Fully Tested - Open-Source WINS!

Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & ...

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT

LLaMA 3 UNCENSORED 🥸 It Answers ANY Question

Zuck's new Llama is a beast

This Llama 3 is powerful and uncensored, let’s run it

Fully local RAG agents with Llama 3.1

Llama 3.1 is ACTUALLY really good! (and open source)

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results

How Did Llama-3 Beat Models x200 Its Size?

Llama 405b BEAST already exploited | Here’s how

Llama 8b Tested - A Huge Step Backwards 📉

Llama 3 BREAKS the industry !!! | Llama3 fully Tested

'okay, but I want Llama 3 for my specific use case' - Here's how

LLaMA 3 “Hyper Speed” is INSANE! (Best Version Yet)

Build Anything with Llama 3 Agents, Here’s How

Getting Started With Meta Llama 3.2 And its Variants With Groq And Huggingface

Llama 3.1 405b Deep Dive | The Best LLM is now Open Source

Llama 3.1 Is A Huge Leap Forward for AI

Extending Llama-3 to 1M+ Tokens - Does it Impact the Performance?

Llama 3 8B: BIG Step for Local AI Agents! - Full Tutorial (Build Your Own Tools)

Llama 3.1 | Meta is leading Open Source AI

Llama 3.1 Review – A first look at Llama 405B and other updated Llama 3.1 models

LLaMA 405b is here! Open-source is now FRONTIER!