ACL 2021 invited talk: Learning-to-learn through Model-based Optimization, by Prof Eric Xing

preview_player
Показать описание
Title:

Learning-to-learn through Model-based Optimization: HPO, NAS, and Distributed Systems

Abstract:

In recent years we have seen rapid progress in developing modern NLP applications, by either building omni-purpose systems via training massive language models such as GPT-3 on big data, or building industrial solutions for specific real-world use cases via composition from pre-made modules. In both cases, a bottleneck developers often face is the effort required to determine the best way to train the model: such as how to tune the optimal configuration of hyper-parameters of the model(s), big or small, single or multiple; how to choose the best structure of a single large network or a pipeline of multiple model modules; or even how to dynamically pick the best learning rate and gradient-update transmission/synchronization scheme to achieve best “Goodput” of training on a cluster. This is a special area in meta-learning that concerns the question of “learning to learn”. However, many existing methods remain rather primitive, including random search, simple line or grid (or hyper-grid) search, and genetic algorithms, which suffer many limitations such as optimality, efficiency, scalability, adaptability, and ability to leverage domain knowledge.

In this talk, we present a learning-to-learn methodology based on model-based optimization (MBO), which leverages machine learning models which take actions to gather information and provide recommendations to efficiently improve performance. This exhibits several advantages over existing alternatives: 1) provides adaptive/elastic algorithms that improve performance online; 2) we can incorporate domain knowledge into these models for improved recommendations; 3) can easily facilitate more-data-efficient automatic learning-to-learn, or Auto-ML. We show applications of Auto-ML via MBO in three main tasks: hyper-parameter tuning, neural architecture search, and Goodput optimization in distributed systems. We argue that such applications can improve productivity and performance of NLP systems across the board.
Рекомендации по теме
visit shbcf.ru