Build an Ecosystem, Not a Monolith

preview_player
Показать описание
Colin Raffel (University of North Carolina & Hugging Face)
Large Language Models and Transformers

Currently, the preeminent paradigm for building artificial intelligence is the development of large, general-purpose models that aim to be able to perform all tasks at (super)human level. In this talk, I will argue that an ecosystem of specialist models would likely be dramatically more efficient and could be significantly more effective. Such an ecosystem could be built collaboratively by a distributed community and be continually expanded and improved. In this talk, I will outline some of the technical challenges involved in creating model ecosystems, including automatically selecting which models to use for a particular task, merging models to combine their capabilities, and efficiently communicating changes to a model.
Рекомендации по теме
Комментарии
Автор

🎯
00:00 🧠 Specialist models are often cheaper and sometimes better than large generalist models, as seen in various benchmarks and research.
04:28 💡 Ecosystems of specialized models can be built, where a collection of many specialized models can perform a wide range of tasks collaboratively.
08:39 🌐 Specialist models can be efficiently created by making communicable updates to a common base model, allowing for improved performance without increasing storage significantly.
14:12 🚀 Specialist models can be created through parameter-efficient fine-tuning methods, resulting in models specialized for specific tasks while requiring fewer parameters.
21:57 🛤️ Automatic routing among specialist models is crucial for ecosystem effectiveness, where a mechanism for selecting and combining models based on tasks is needed to replace generalist models.
23:07 📊 Using vectors as summaries of the diagonal of the Fisher Information Matrix helps represent tasks based on model gradients over datasets.
24:28 🌐 Specialized models can be associated with specific data based on similarity, allowing appropriate model selection for data.
25:30 🚀 Mixture of Expert style models can be considered an ecosystem of sub-models, enabling adaptive routing among specialized networks.
28:39 🎯 Merging models can be a way to compose capabilities of specialized models, providing opportunities for transferring skills across models.
35:00 📚 Formulating joint embedding spaces for tasks, datasets, and models could enhance the compositionality of capabilities across the ecosystem.
47:13 🧩 Merging models from similar data/tasks can boost performance. Multitask/multilingual learning can be merged but might not perform as well as expected.
48:25 🛠️ Building an ecosystem requires user-friendly collaboration tools. Git Theta replicates Git workflow for tracking and merging model checkpoints.
50:00 💾 Git Theta reduces model checkpoint storage. Efficient fine-tuning saves space. Graph illustrates parameter-efficient fine-tuning benefits.
52:03 🌐 Distributed inference can be simplified using "Pedals." It splits model layers, compresses activations, and supports volunteer computing.
56:08 🌱 Scalable ecosystems could include diverse models. Pruning models and considering compatibility could optimize the ecosystem's efficiency.

antonpictures
Автор

I've been working on something very similar to this. I've shared it on my youtube. My system utilizes 7 AI models and personality specific database retrieval and a variety of APIs to answer questions. I use zero-shot classification of the input with 10 labels to determine the sub-model used in the pipeline when the prompt is not recognized by algorithmic methods that can define a search query.

MagusArtStudios
Автор

As with most AI, the vector component "categories" are artificial.

peterwaksman