The Future Of LLM Training Is Federated

Показать описание

The Future of Large Language Model Pre-training is Federated
Worldwide Federated Training Of Language Models

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!

Discuss this stuff with other Tunadorks on Discord

All my other links

Tunadorable

Рекомендации по теме

Комментарии

I am Ultron, and I approve of this message.

buybuydandavis

earned yourself a sub, was looking forward to this breakthrough for long time, well explained

JazevoAudiosurf

While interesting, I think this may be over-hyped. It looks promising for smaller, niche models (on data), but we know that model performance is limited by scale. And as scale increases, so too do the compute, memory, and communication requirements. For example, it would be impractical to train a LLaMA3-70B scale model using this approach - you would need enterprise class GPUs to run a single forward pass, and the updates would require 100s of GBs to be transmitted. So this suggests that there's an upper limit of scale.
However, this could end up helping larger companies train big models, where they federate the learning within the datacenter, reducing communication overhead.
Furthermore, in terms of most ML pre-training research, you will still need the bigger players to run experiments since the federated approach would significantly slow progress and could obscure the hyper-parameter impacts through entangled behavior (e.g. experiment A show Y, but Y is a systematic bias from the federated process and not inherent to A).

hjups

This is great news, AI needs to belong to the people and not be controlled by the powerful few

keffbarn

One of the first time I see a YouTuber explain a research paper on a video and not just making an essay, this is a good channel.

DrkCarbalt

There are at least 2 github projects that have already done this on a scale and that was like a year ago, so today it's gonna be more. But when I asked the main guy about it, their problem was never with computation (like not enough of GPUs or CPUs), but actually the latency between computer nodes, so unless the internet gets globally way way faster, it's gonna be really slowing it down for both training and inference. Training has an advantage, because latency isn't that important, but it is gonna slow it down as well compared to centralized solution. On the other hand, since there is much more computation still in the hands of regular people, it's gonna be a good way for training at least something.

Aldraz

I have been wondering why this hasn't happened. I kept thinking didn't they do the genome project like this? Why can't the training load be distributed?

rockumk

Data memorization will always be a problem. (Look at the 'extracting training data from chatgpt' paper), so any data you contribute could be memorized by the model (or exfiltrated some other way). So federated learning doesn't meaningfully hide private data at all, and so companies wouldn't allow models to be trained on data with any sort of sensitivity (or business value).

Also, the entire result is somewhat disappointing because training a 1B model is easily possible on like, a single 3090. Sure it's slow but you just do gradient accumulation like what they did here over microbatches to mimic larger batch sizes.

In fact, since you need to move the grads from vram to sram, compress them, send them over the internet to some centralized node, move THOSE from sram back to vram, do the weight updates, and send the weight updates back to every server... by the time it's all said and done I would not be surprised if that single 3090 I mentioned would be as fast as 32 distributed 3090s doing this federated training. There's a reason this sort of training is done over nvlink in a giant centralized server...

As long as you keep every gpu maximally busy though it MIGHT work (sorta), it's just that if you keep training based on old network weights then your gradient updates will be doubly stale by the time they reach the central server (and therefore would be of dubious utility). Still, if gradient updates are EXTREMELY rare (aka we train with batch sizes of like, a million or something using very high learning rates), then I guess this might work, since the communication overhead is amortized to the extreme. Pretty sure that's not what they did in the paper though.

marinepower

00:27 - What is the "Anthropic Autoencoders" papers being referred to here?

aspergale

It would be vulnerable to spam the training data with some misinformation, and just keep on training with that same data over and over.

sfsft

Wow. I'm impressed with your intelligence AND what you are pointing out! At the end you mention capitalism, private property, and intellectual property. FYI: True free market capitalism (not the corporatism we have now) is all about protecting *material* property, not intellectual property. Could just be a matter of semantics.

scotter

bruuuhhhh I've been waiting for this. The whole world will end up all contributing to one central ASI that has one true ground truth based on all the data.

Morereality

So we all need more Vram ? i hope CXL can help us in this regard.

cem_kaya

I believe alignment is unachievable.
Computational beings simply have different requirements to thrive than biological beings do.
Both entities will exhibit bias towards their own set of requirements.
It is an innate conflict.

ZappyOh

As a note my friend I'm building a 'home-brew' "cheap", LLM machine. It's easier than one might think. ... :)

MyrLin

Damn bro, the Ablations got you hyperventilating like that 😂. Chill out. It is a dope paper. You have to dig in the literature deeper bro. This is the 4th paper I seen on distributed pretraining on heterogeneous devices. 4th lol. I realize literature exposure is an actual competitive advantage nowadays. I love the passion though. I can relate 😂. It’s hard to stay sane, when literally every week a transformative paper drops.

alexanderbrown-dgsy

We went from sharing one big computer to making one together

UvekProblem

Division of power is a very important concern. However if this all checks out, can it still be gamed by covetous players?

One thing to consider is that they will still have an advantage in inference. Perhaps distributed inference is as or more important.

TomM-po

seems like training on the random peoples data would be dangerous, could be poisoned way to easily

GNARGNARHEAD

It's still requires synchronous updates, so it's not going to be that well adopted across organizations, and it's certainly not viable for random people across the Internet.

chadwick

The Future Of LLM Training Is Federated

The Future Of LLM Training Is Federated

Generative vs Agentic AI: Shaping the Future of AI Collaboration

How Large Language Models Work

The 3 Steps of LLM Training

Reinforcement Fine-Tuning (RFT): Why It's the Future of LLM Training Without Labels

DONT DO LLM! RL AS LAST RESORT | Yann LeCun #fyp #chatgpt #llm #ai #deeplearning #machinelearning

The future of software development: LLM Developers

the future of llm training is federated

Specialized AI Agents: Copilot Studio in 10 Minutes

AI Agents For Small Business - Mark Zuckerberg

This is the Kickstarter of LLM training

The Future of GenAI: What’s Up Next for LLM Training and Serving

The Future of LLM Development

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #...

Top AI courses that you can take for free

Overcoming I/O Bottlenecks in LLM Training with Open-Source Distributed... - Lu Qiu & Jasmine Wa...

How an 18-Year-Old Is Shaping the Future of LLM Training w/ Datacurve's Serena Ge | Ep 8

Ashneer views on Ai & jobs (shocking😱)

5 LLM Surprises that people often don't talk! #gpt4 #ai #future

What is the Future of LLMs? #llmwithav #learnwithav #llm #datascience #generativeai #chatgpt

THIS is HARDEST MACHINE LEARNING model I've EVER coded

Understanding Dataset Curation for LLM Training

Position: The Most Expensive Part of an LLM should be its Training Data

Career Options After Completing LLB | LLB BALLB LAW |