Using Clusters to Boost LLMs 🚀

Показать описание

I test the cluster workflow that allowed Alex to run Llama 405B model on 2 laptops!

🛒 Gear Links 🛒

🎥 Related Videos 🎥

— — — — — — — — —

❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺

— — — — — — — — —

Join this channel to get access to perks:

— — — — — — — — —

#machinelearning #llm #softwaredevelopment

Рекомендации по теме

Комментарии

even though I could afford to get a 4080 or 4090, I refuse to pay extortion prices. Nvidia has gotten too greedy. So glad I have my M2Max with 96G to do fun ML research projects

woolfel

Oh my, It's great that someone is making content of this depth ❤️

AlmorTech

Cool! Everyone that makes AI models more convenient and accessible is a hero in my book (that includes you, Alex). Currently I'm running the smaller mistral model on my M2 macbook air base model. I am considering to buy a mac mini or mac studio when the new ones come out, and this might be what I need to run the larger models. Mistral is great, but I want to use it in combination with fabric and for that it just does not cut it. Keep it up Alex, you make me look smarter at work with every video ;)

Manuel-og

I've tested exo before and ran into a lot of the same issues that you were experiencing and this was on a 10gbe network. I haven't tried it again after the failed attempts, but I do think that this kind of clustering could be very powerful with even smaller models. If it supports being able to handle multiple concurrent requests and exo acts as a "load balancer" for the requests, then you could have one entry point into a much larger capacity network of machines running inferences. This is opposed to trying to find your own load balancing mechanism (maybe HAPROXY) to balance the load, but then you still have the issue of orchestrating each machine to download and run the requested model.

dave_kimura

The idea is super cool, I’d love to be able to use multiple computers to accomplish more than what is possible with just one computer. The idea seems to me that it addresses the issue of expensive graphic cards… which is probably the next best alternative… a ‘modeling host’ with powerful graphics cards being available over the network to smaller ‘terminals’

danwaterloo

Hi Alex,

Very nice video, but I had to smile a bit because of the test setup.

I have a cluster running at a customer, but for a different application and this technology can really bring a lot in the area of performance and failover. I am enthusiastic when cluster computing becomes generally more available and usable.

It is very important to build a high-performance, dedicated network for cluster communication. With Macs, this is quite easily possible via Thunderbolt bridge. I recommend assigning the network addresses manually and separate the subnet from the normal network.

With 40 Gbit/s you have something at hand that otherwise causes a lot of work and costs. (apart from the expensive cables.)

Of course, it is better if all cluster nodes work with comparable hardware, which simplifies the load distribution, but in generally different machines are possible.

In your case, unfortunately, a base air, which in itself hardly can handle the application, is more of a brake pad than an acceleration, as you impressively showed.

A test with two powerful Macs would be interesting.

thesecristan

Good day to you :) thanks for the content.

Adriatic

Would thunderbolt networking speed up the cluster at all? Are they just communicating over wifi?

Rushil

lol when he was looking for the safetensors i was thinking “please be in HF cache, please be in HF cache” and of course, this Alex fellow is wonderful. Means this will be simple to drop in current workflows. 405b should fit well across 4 Mac studios with 192GB 👌 next question will be if it can distribute fine-tuning

thomasmitchell

You could try to run the tool in docker containers a with one shared storage over network for the model. That would help with the disk space issues.

danielserranotorres

this is a good start but the problem is still that it's trying to load the entire model on each machine. A better solution would be to share the model across machines and access it in a bittorrent type of style. Not sure how that will work though.

cyberdeth

Call back to the good old days of Beowulf clusters for unix. I picked up 5 old HP mini PCs with an intel 6 core cpu, 1TB NVMe and 64GB of ram in each. These are all on my 10GB in house ethernet so I'll give it a go and let you know. Great video, thanks.

WireHedd

That's why I'm waiting for Mac Studio with M4Max/Ultra. 256GB for big models with good SoC soon will be esential... or already is...
Anyway... An iOS dev I'm using 20-40b models, they are heavy but not too much, they can respond in resonable time and they are not using 50GB+

WujoFefer

You should investigate and do more videos with this cluster llms

Z-add

It's for people who have two MacBook Pro with 256gb of ram on a plain

litengut

Use a NAS ... wouldn't have to download multiple times ... and point to a shared directory.

RunForPeace-hkcu

a great project and idea, maybe the next step could be the addition of shared memory from the cloud

stefanodicecco

I wonder if you could share the models and network via thunderbolt

psychurch

This is so cool but i think if there is a possibility to get multiple VPSs and connect them then run the model on them it would be cooler

houssemouerghie

I design air gapped AI inference systems, I do my initial tests on 30x Raspberry Pi’s to focus on efficiency. Obviously dedicated GPU memory is not possible. Maybe this teamed with the about to be announce M4 Mac Mini will be the next evolution. Also derisks getting a £thousands bill by accident on a cloud based test lab.

allanmaclean

Using Clusters to Boost LLMs 🚀

Using Clusters to Boost LLMs 🚀

New Method Runs Big LLMs on Smartphones

Cheap mini runs a 70B LLM 🤯

Lightning Talk: Enhance Investigations Using LLM, Embeddings, and Clustering

Run LLMs without GPUs | local-llm

I Ran Advanced LLMs on the Raspberry Pi 5!

All You Need To Know About Running LLMs Locally

The ONLY Local LLM Tool for Mac (Apple Silicon)!!

AWS re:Invent 2024 - AWS IoT for edge LLM deployment and execution (IOT202)

Noah Santacruz - Beyond KMeans - using LLMs to improve text clustering

LLMs with 8GB / 16GB

Efficient Large-Scale Language Model Training on GPU Clusters

How NVIDIA Improves GPU Cluster Utilisation with LLM Agents #nvidia #llm

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Aguru LLM Router: Integrated with Caching and Data Clustering

Future of Intelligent Cluster Ops: LLM-Azing Kubernetes Controllers - Rajas Kakodkar & Amine Hil...

Easy Tutorial: Run 30B Local LLM Models With 16GB of RAM

How to Improve LLMs with RAG (Overview + Python Code)

Google’s New TPU Turns Raspberry Pi into a Supercomputer!

Low Power Cluster - Small, Efficient, BUT Powerful!

What runs ChatGPT? Inside Microsoft's AI supercomputer | Featuring Mark Russinovich

Clustering with DBSCAN, Clearly Explained!!!

Deploying LLMs on Kubernetes: Can KubeRay Help?

How are LLMs Trained? Distributed Training in AI (at NVIDIA)