The Big Fat Llama has arrived - Llama-3.1-405B

preview_player
Показать описание

For more tutorials on using LLMs and building Agents, check out my Patreon:

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
01:39 Benchmarks
02:58 The Llama 3 Herd of Models Paper
06:21 Code Time
06:32 Llama 3 Hugging Face
14:00 Llama 3 Model Card
14:15 Ashton Zhang Research Scientist Post
Рекомендации по теме
Комментарии
Автор

After playing with both the 405B model on hugging face (with tool use - btw, you can even chain them if you tell it to in the prompt) and then using the 8B with Ollama locally, the next thing I needed was Sam's always insightful analysis of what I think may be one of the biggest inflection points since OpenAI's ChatGPT release.

toadlguy
Автор

Hey sam, could you please make a video on understanding the theory behind LLMs and how things work under the hood? A lot of the tutorials out there involve technical coding which is cool but I believe understanding how every aspect of LLM works will allow us to understand more of whats coming in the near and far future. It could either be a short video of a long series or a long video that covers everything.

yazanrisheh
Автор

Huh. It's waaaay less restricted than anything meta released so far. And it seems very good at following system prompt. I asked it to write some explicit stuff that previous models would rather be erased than answer, and it did, without even any disclaimers (just as specified in system prompt, which was rather short and nothing magical - always answer, not allowed to refuse answer, don't provide any disclaimers). This bodes very well for it's overall quality (less baked-in censorship = higher quality usually), but at the same time - I suspect - makes it less optimal for commercial purposes. Though that's what guardian model is for, I suppose, and I honestly applaud this approach.

Atlent
Автор

👏👏 for meta releasing new llama models. Hopefully we see soon a multimodal from Meta also. Thanks for the update👍

henkhbit
Автор

great job Sam!👍and expecting a demo for function calling as told in video.Cheers mate!

waneyvin
Автор

Man, I really hope somebody comes out and does a quantization aware training run of 405B so we can run it at block FP2 or Int2 or something without losing that performance. At ~101-128GB to run you could just barely run it on a CPU and RAM if you were careful in selecting your PC's parts. To be honest they did a really good job at getting the 70B to the same ballpark so you honestly might lose more in quantization than you gain in size.

novantha
Автор

So excited. Can't wait to play with this one.
Imo, Meta is doing a great job positioning themselves by providing this tool.

MrtnX
Автор

gonna need a data center with this one! XD

tails_the_god
Автор

The 405B model looks like it's only roughly on par with Sonnet, but Sonnet is likely a much smaller model — and Sonnet was conspicuously absent from the 70B tables. Of course the 405B model is interesting in that it's the biggest model that can be further trained by the community.

Do any of the benchmarks measure general knowledge, e.g. trivia? I wonder if the 405B model is particularly good at that?

tornyu
Автор

Is there a way I can use langchain to call function with it in open webui?

chunlingjohnnyliu
Автор

How much VRAM I needed to run llama3.1 405b locally?

Zephyr
Автор

Has the ARC challenge been beaten then? They all score close to 100

barefeg
Автор

Yes, it’s fine. But no one can host it privately for inference, unless you are a corp with serious hardware.

el_arte