M3 max 128GB for AI running Llama2 7b 13b and 70b

preview_player
Показать описание
In this video we run Llama models using the new M3 max with 128GB and we compare it with a M1 pro and RTX 4090 to see the real world performance of this Chip for AI.

Рекомендации по теме
Комментарии
Автор

Yes, there’s how YouTubers should present the 128gb model, not with video editing or benchmarks

tamsaiming
Автор

Sold on the M3 Max. That 70B test. Damn.

VikramMulukutla
Автор

I’ve been searching EXACTLY for this! Thank you. Subscribed and looking forward to those next videos on stable diffusion and your grandma’s clone (if I understood well). Thanks bro!

matumatux
Автор

Thank you for this comparison! I've been searching for this for a long time and couldn't find it anywhere concrete.

uwepuneet
Автор

Outstanding video - thank you for taking the time to do this. It’s exactly the comparison I was looking for.

stephe
Автор

Hey man! I was here for self gratification for having bought an M3 max but your grandma experiment got me subscribed. Can't wait!

MikaMoupondo
Автор

great test. thank you. for all of those working with LLM models and are considering M3's this is what we're looking for. People should understand these are q4 gguf models but still it's a very relevant test and it's good to see the Mac's unified memory working. I'd love to see how would be to run an interface like streamlit along with RAG on instructor XL and ollama or textgen and see if the Max can handle all of them together.

stephenthumb
Автор

Would be good to see fine tuning (Autotrain / PEFT) limits on Llama2 models for M3 MAX 128GB

anguss
Автор

you just sold me on getting the m3 max pro for my project. thank you!

pogimestiso
Автор

Omg, finally a channel using these machines for proper accessible AI instead of YouTube content creation, keep up the great job and a video how to get models up and running would be awesome 👏

hossromani
Автор

hope the M3 Ultra make a huge shift in performance.

oterotube
Автор

Idk what YouTube is doing for your video bro but this is the EXACT video I was looking for in search but it didn’t come up. Didn’t appear till much later on my for you page. Glad I found it though, you did an excellent job

kaojaicam
Автор

This is exactly what I was looking for.
Great video very informative.
Subscribed

mamaleone
Автор

Thx man, i searched for this benchmark comparison so long. Greetz!

MisterAndreSafari
Автор

You didn't pin them to the same random seed, so they're generating different text, so it's hard to compare the elapsed time because they're not generating the same number of tokens.

daves.software
Автор

I know others already said this a few times, but I just wanted to parrot. This actual usage comparison for real world workloads in large memory environments is truly what many people look for instead of all of the bull$hit repeated, bare-minimum usage, scenarios. A hearty thank you for showing true limits between a 4090 and a large memory M3 max. Cheers!

MobileSpace
Автор

Great comparison! The parameter always gets bigger. Based on your result, I'm thinking the upcoming 5090 won't be my next purchase. More likely M4 with 128GB or 256 RAM will be my next stop.

klaymoon
Автор

I was hoping someone would post this kind of comparison. It seems the unified memory is a huge advantage for running larger llms. One thing I did not understand at the final 70b test, what was the amount the memory used by the m3, could you have gotten away with 64gb only instead of 128gb? Thank you for the effort you put in creating and sharing this test. Subscribed!

axotical
Автор

I have this same CPU/RAM combination. I've been able to run up to the 120B Goliath models, q4 quantization. Very fast inference.

markclayton
Автор

You just helped me so much. Thank you.

Stewz