Real-time target sound extraction using attention

Показать описание

Real-time target sound extraction, ICASSP 2023
Bandhav Veluri, University of Washington

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures.

This video is closed captioned.