Using Clusters to Boost LLMs 🚀

preview_player
Показать описание
I test the cluster workflow that allowed Alex to run Llama 405B model on 2 laptops!

🛒 Gear Links 🛒

🎥 Related Videos 🎥

— — — — — — — — —

❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺

— — — — — — — — —

Join this channel to get access to perks:

— — — — — — — — —

#machinelearning #llm #softwaredevelopment
Рекомендации по теме
Комментарии
Автор

even though I could afford to get a 4080 or 4090, I refuse to pay extortion prices. Nvidia has gotten too greedy. So glad I have my M2Max with 96G to do fun ML research projects

woolfel
Автор

Oh my, It's great that someone is making content of this depth ❤️

AlmorTech
Автор

Cool! Everyone that makes AI models more convenient and accessible is a hero in my book (that includes you, Alex). Currently I'm running the smaller mistral model on my M2 macbook air base model. I am considering to buy a mac mini or mac studio when the new ones come out, and this might be what I need to run the larger models. Mistral is great, but I want to use it in combination with fabric and for that it just does not cut it. Keep it up Alex, you make me look smarter at work with every video ;)

Manuel-og
Автор

I've tested exo before and ran into a lot of the same issues that you were experiencing and this was on a 10gbe network. I haven't tried it again after the failed attempts, but I do think that this kind of clustering could be very powerful with even smaller models. If it supports being able to handle multiple concurrent requests and exo acts as a "load balancer" for the requests, then you could have one entry point into a much larger capacity network of machines running inferences. This is opposed to trying to find your own load balancing mechanism (maybe HAPROXY) to balance the load, but then you still have the issue of orchestrating each machine to download and run the requested model.

dave_kimura
Автор

The idea is super cool, I’d love to be able to use multiple computers to accomplish more than what is possible with just one computer. The idea seems to me that it addresses the issue of expensive graphic cards… which is probably the next best alternative… a ‘modeling host’ with powerful graphics cards being available over the network to smaller ‘terminals’

danwaterloo
Автор

Hi Alex,

Very nice video, but I had to smile a bit because of the test setup.

I have a cluster running at a customer, but for a different application and this technology can really bring a lot in the area of performance and failover. I am enthusiastic when cluster computing becomes generally more available and usable.

It is very important to build a high-performance, dedicated network for cluster communication. With Macs, this is quite easily possible via Thunderbolt bridge. I recommend assigning the network addresses manually and separate the subnet from the normal network.

With 40 Gbit/s you have something at hand that otherwise causes a lot of work and costs. (apart from the expensive cables.)

Of course, it is better if all cluster nodes work with comparable hardware, which simplifies the load distribution, but in generally different machines are possible.

In your case, unfortunately, a base air, which in itself hardly can handle the application, is more of a brake pad than an acceleration, as you impressively showed.

A test with two powerful Macs would be interesting.

thesecristan
Автор

Good day to you :) thanks for the content.

Adriatic
Автор

Would thunderbolt networking speed up the cluster at all? Are they just communicating over wifi?

Rushil
Автор

lol when he was looking for the safetensors i was thinking “please be in HF cache, please be in HF cache” and of course, this Alex fellow is wonderful. Means this will be simple to drop in current workflows. 405b should fit well across 4 Mac studios with 192GB 👌 next question will be if it can distribute fine-tuning

thomasmitchell
Автор

You could try to run the tool in docker containers a with one shared storage over network for the model. That would help with the disk space issues.

danielserranotorres
Автор

this is a good start but the problem is still that it's trying to load the entire model on each machine. A better solution would be to share the model across machines and access it in a bittorrent type of style. Not sure how that will work though.

cyberdeth
Автор

Call back to the good old days of Beowulf clusters for unix. I picked up 5 old HP mini PCs with an intel 6 core cpu, 1TB NVMe and 64GB of ram in each. These are all on my 10GB in house ethernet so I'll give it a go and let you know. Great video, thanks.

WireHedd
Автор

That's why I'm waiting for Mac Studio with M4Max/Ultra. 256GB for big models with good SoC soon will be esential... or already is...
Anyway... An iOS dev I'm using 20-40b models, they are heavy but not too much, they can respond in resonable time and they are not using 50GB+

WujoFefer
Автор

You should investigate and do more videos with this cluster llms

Z-add
Автор

It's for people who have two MacBook Pro with 256gb of ram on a plain

litengut
Автор

Use a NAS ... wouldn't have to download multiple times ... and point to a shared directory.

RunForPeace-hkcu
Автор

a great project and idea, maybe the next step could be the addition of shared memory from the cloud

stefanodicecco
Автор

I wonder if you could share the models and network via thunderbolt

psychurch
Автор

This is so cool but i think if there is a possibility to get multiple VPSs and connect them then run the model on them it would be cooler

houssemouerghie
Автор

I design air gapped AI inference systems, I do my initial tests on 30x Raspberry Pi’s to focus on efficiency. Obviously dedicated GPU memory is not possible. Maybe this teamed with the about to be announce M4 Mac Mini will be the next evolution. Also derisks getting a £thousands bill by accident on a cloud based test lab.

allanmaclean