Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

preview_player
Показать описание
In this video, I'll be fully testing the Llama-3.1 405B, 70B & 8B Models to check if it's really good. I'll also be trying to find out if it can really beat Claude 3.5 Sonnet, GPT-4O, DeepSeek & Qwen-2. This model is fully opensource and can be used locally for FREE. It is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔔 Llama-3.1 Launch: Discover Meta's latest Llama-3.1 models, including 8B, 70B, and 405B variants, setting new benchmarks in AI.

📊 Model Comparison: See how the 8B and 70B models stack up against the 405B, Claude 3.5, Sonnet, and GPT-4O, showcasing their impressive performance.

💰 Pricing Update: Get the latest pricing for the Llama models, with Fireworks offering the best deals on 8B, 70B, and 405B, making advanced AI more affordable.

🆓 Platform Availability: Learn where to access these models, from Meta's platform to Huggingface and Ollama, and why Nvidia NIMS is the go-to for free, easy access.

🤖 Tough Questions: Watch as the Llama models tackle 12 challenging questions, from general knowledge to coding tasks, highlighting their capabilities and limitations.

📈 Performance Insights: Understand why the 8B and 405B models shine in terms of size and quality, while the 70B model struggles, with comparisons to Mistral Nemo 12B.

------
Timestamps:

00:00 - Introduction
00:08 - About Llama-3.1 Models
02:10 - Testing
10:02 - Conclusion
11:22 - Ending
Рекомендации по теме
Комментарии
Автор

if you change the system prompt and force 8b model to analyse and calculate first then answer the users it will answer most of the questions correctly. i tested in Llama 3.1 8B q8_0. this is the example of system prompt i used: "You are a helpful, smart, kind, and efficient AI assistant. You should always calculate and analyse the question first then fulfill the user's requests to the best of your ability." it will pass 8 questions like 70b

upper-moon-alpha
Автор

The 405B game of life might actually be correct... Assuming that it wraps at the edges, the checkerboard starting state will cause everything to die even if the rules are correctly applied.

ij
Автор

Hey, the SVGs from the 70B and 405B look a bit like butterflies - the result should be PASSED.

manuelgrama
Автор

I spent several hours yesterday working with the 8B FP16 model. The project was coding for an ESP32 using a MEMS I2S microphone, I was using Espressif in VSCode. I would say it's about as good as sonnet in this situation. Inference was fast, answers were well formed.

LlamaKing

jackflash
Автор

I run 8b version locally, (not even Q8!). And it answered 5th question about apples correctly! 7th question also (almost) correct, last part of the answer:Therefore, the long diagonal of the regular hexagon is approximately 73.10(no idea how it answers this???because other big models don't, not even 4o???). 8th question kinda worked...Just looked more like confetti rain, not explosion. So, must be something wrong with your settings, like temperature or promt template. Using temp 0. My model only failed questions 1 and 10. Oh, and 6 too.

mlsterlous
Автор

Nahh bro. The 70b and the 405b models passed the butterfly test

simeonnnnn
Автор

Video Claude 3.5 sonnet Vs Llama3.1 405B

aryindra
Автор

it would be useful a video where you show how you used the Nvidia platform for free to use these models

fra
Автор

Going to be interesting to see how much better these models get now the community has access to them and going on older models in the pass, some fine-tuning can get some real improvements out of these, especially the 8B and 70B model as they are likely going to be worked on a lot more, being that more of us can actually run them compared to the 405B one which will likely get far less work done on fine-tuning it as hardly any of us can run it locally.

It's a shame Meta didn't release a 13B model and a mixtral like model as I suspect the quality could be a lot better for the smaller size models if done as a mixer of models.

pauluk
Автор

AICodeKing, OpenAI, ChatGPT 4o got 10/12 correct, for 83.33%. Question, what is a good model on Ollama to use to aid in first drafts for novels, novella, etc.?

robwin
Автор

If you haven't tried the same prompt multiple time for a model, you may want to try again for the same prompts with a model as they are stochastic even when given an exact same prompt, with same seed values. Given it's stochastic, you may have tapped into it's failure side with that single run.

taheralipatrawala
Автор

My 8b answerd the Apple question correct as well as the Sister one

drkvaladao
Автор

Hey, could you make a comparison video about GPU servers + installations of some AI models and also how to access them from anywhere? Thanks!

unisol
Автор

I wonder if they're trying to compete with the mini 4o on this one. For example, if you use the 8b model to talk to itself, will it get better results? It still seems computationally expensive for that, though. It's still cool to have something like this that we can fine tune. Thank you, Llama King

vauths
Автор

Has, and always been a fan of Llama and Llama King

neo
Автор

Llama King. I appreciate your content as always. However, you should seriously retry gemma 2 27B since the updated fix has been issued. I ran it in LM Studio with your questionnaire and the model only missed 2 of your questions. Honestly it's worth a revisit instead of bashing it before actually running the model correctly.

joshbane
Автор

mistal 7b v0.3 and llama 3.1 8B which one is better?

mash-room
Автор

the 405b is great but how to use it in the real world ? we can not download it. we would need to access it through whoever provides it. I am looking for the API solutions.

ganian
Автор

Ollama updated llama3.1 during my tests, but after updating and re-pulling everything, I repeated my tests:

70.000 words, a summary in about 3000 words, 8B writes jibberish, completely out of bounds. VRAM usage: 107GB + 17 GB in cache, while a total of about 170GB useable VRAM of an 192GB unified system.

Weird. The llama3-gradient line worked better. It „just does not stop writing after the end of the original chapters and hallucinates further chapters till the end of the token window.

LlamaKing? 🎉

MeinDeutschkurs
Автор

I tested LLama-3.1-8B and it is terrible at coding tasks compared to Deepseek-coder-v2 16B

ottawadigs