Llama 3.1 405b model is HERE | Hardware requirements

preview_player
Показать описание
In this video, we dive into Meta’s latest AI breakthrough: the Llama 3.1 405B model! Learn about its state-of-the-art capabilities, Inference requirement of over 16K NVIDIA H100 GPUs, supporting a massive 128K context length. Discover how this model excels in general knowledge, multilingual translation, and more, pushing the boundaries of AI technology. Whether you’re an AI enthusiast or a developer, this video covers everything you need to know about the Llama 3.1’s groundbreaking features and applications. Don’t miss out on the future of AI!
Рекомендации по теме
Комментарии
Автор

Please keep in mind that the context window also increases vram needs. 128k? We‘ll need something like Apple M8 Extreme Chip with X terrabyte(s) of unified memory. The cool thing, it will cost something around 10k-15k instead of 200k.

MeinDeutschkurs
Автор

I have 72GB VRAM - Can't wait to run the 405B parameter model at 0.01bpw.

But I am going to screw around with this on my 512GB RAM Epyc box. Expecting a couple seconds per token, should be wicked awesome.

Those_Weirdos
Автор

3:15 It Looks like we crossed the point when it was possible to run AI locally. Now, you need a tiny supercomputer to operate with cutting edge models. =((

Ukuraina-cssu
Автор

Look forward to your quantisation results😊

MrOktony
Автор

Awesome! Great video, learned a lot cheers 👍

InstaKane
Автор

can i use amazon or ibm servers to run the 70b or 405b model?

क्लोज़अपवैज्ञानिक
Автор

10:04 "You are on a Quuee" 🤭

martianreject
Автор

There was a sticker that came with your mic cluing you into the fact that it's a side-address 😆don't be a Yeti

threepe
Автор

I cannot get it install. I don't know what I'm doing wrong. I've gotten basically to every point. Except for the very last one when you type in y to confirm that you're okay with the file size

ThatGuyJoss
Автор

3:31 wait 3 years and it would be possible 😃

gileneusz
Автор

I tried to run the 70b model with 16GB and it just crippled the machine and ran up 2.5 GB of swap.

mendodsoregonbackroads
Автор

If anyone has about a quarter million they could loan me, I'll happily pay it back once I make all the Internet monies. Soon, I'll guess.

crs_net
Автор

NVIDIA A100 GPU is 30K USD, and you need many! Each GPU takes 450W, it's nonsense in electrical bill and initial price.

juliusvalentinas
Автор

Can I run it on my CPU? I have 44 cores and 512GB RAM.

thecount
Автор

I think you can try it on Hugging Face

dahee
Автор

8:42, not possible, the smallest quant is Q2_K 149.0 GB, possible to run on Mac Studio 192GB, but gives only 2 t/s, better to use just hugging

gileneusz
Автор

U need 250g ram to run the 4bit model.

ps