filmov
tv
[T@W intro] Drew Jaegle — Long-Context Anymodal Generation with Perceivers
Показать описание
Drew Jaegle (Research Scientist at DeepMind) gives an overview of his upcoming talk at the Transformers at Work workshop.
Title: Long-Context Anymodal Generation with Perceivers
Abstract: A central goal of Artificial Intelligence is the development of systems that flexibly process data from any modality for any task. Perceivers are a family of architectures that scale well to very large inputs in many modalities by encoding data to a latent bottleneck. But latent-space encoding handles all elements in a single pass, while autoregressive generation which has become the go-to tool for generation in language and many other domains - assumes processing happens one element at a time. I will describe Perceiver AR, a recently proposed long-context autoregressive model that avoids these problems by carefully restructuring the Perceiver latent space. Perceiver AR obtains state-of-the-art performance on generation benchmarks on images, language, and music, while scaling to inputs several orders of magnitude longer than Transformer-XL, even when using very deep architectures. Perceiver AR's long context window allows it to easily support data without a natural left to right ordering, and its latent structure allows compute budget to be adapted at eval time for either improved performance or reduced generation time.
Title: Long-Context Anymodal Generation with Perceivers
Abstract: A central goal of Artificial Intelligence is the development of systems that flexibly process data from any modality for any task. Perceivers are a family of architectures that scale well to very large inputs in many modalities by encoding data to a latent bottleneck. But latent-space encoding handles all elements in a single pass, while autoregressive generation which has become the go-to tool for generation in language and many other domains - assumes processing happens one element at a time. I will describe Perceiver AR, a recently proposed long-context autoregressive model that avoids these problems by carefully restructuring the Perceiver latent space. Perceiver AR obtains state-of-the-art performance on generation benchmarks on images, language, and music, while scaling to inputs several orders of magnitude longer than Transformer-XL, even when using very deep architectures. Perceiver AR's long context window allows it to easily support data without a natural left to right ordering, and its latent structure allows compute budget to be adapted at eval time for either improved performance or reduced generation time.