Were RNNs All We Needed? (2 Oct 2024)

Показать описание

Title: Were RNNs All We Needed?
Date: 2 Oct 2024
Authors: Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh

Summary

This research paper explores the potential of traditional recurrent neural networks (RNNs), specifically LSTMs and GRUs, for long-sequence tasks. The authors argue that these models, despite their historical limitations due to sequential computation and the need for backpropagation through time (BPTT), can be made highly efficient and competitive with modern sequence models like Transformers. The key innovation lies in simplifying these RNNs by removing hidden state dependencies within their gates, thereby enabling parallel training using the parallel prefix scan algorithm. This leads to significantly faster training times and improved memory efficiency while retaining comparable performance on tasks like selective copying, reinforcement learning, and language modelling. The authors conclude by posing a thought-provoking question: "Were RNNs all we needed?", highlighting the potential for these simplified RNNs to revolutionise sequence modelling.

Key Topics

RNN Efficiency, Minimal RNNs, Sequence Modelling, Parallel Scan, Empirical Performance