filmov
tv
333_Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular AV-ASR

Показать описание
Audiovisual speech recognition (AV-ASR) faces the problem that for many languages only few audiovisual data is available. Building upon an English model, in this work, we first apply and analyze various adapters for cross-language transfer learning to build a parameter-efficient and easy-to-extend AV-ASR in multiple languages. Fine-tuning only the bottleneck adapter with 4% of encoder's parameters and the decoder shows comparable performance to full fine-tuning in French and Spanish AV-ASR. Second, we investigate the effectiveness of various encoder components in cross-language transfer learning. Our proposed modular linguistic transfer learning approach excels the full fine-tuning method for German, French, and Spanish AV-ASR in almost all clean and noisy conditions (8/9). On low-resourced German AV data (13h), our proposed linguistic transfer learning achieves a 4.1% abs. WER reduction on average for clean and noisy speech, while fine-tuning only 50% of the encoder's parameters.