333_Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular AV-ASR

Показать описание

Audiovisual speech recognition (AV-ASR) faces the problem that for many languages only few audiovisual data is available. Building upon an English model, in this work, we first apply and analyze various adapters for cross-language transfer learning to build a parameter-efficient and easy-to-extend AV-ASR in multiple languages. Fine-tuning only the bottleneck adapter with 4% of encoder's parameters and the decoder shows comparable performance to full fine-tuning in French and Spanish AV-ASR. Second, we investigate the effectiveness of various encoder components in cross-language transfer learning. Our proposed modular linguistic transfer learning approach excels the full fine-tuning method for German, French, and Spanish AV-ASR in almost all clean and noisy conditions (8/9). On low-resourced German AV data (13h), our proposed linguistic transfer learning achieves a 4.1% abs. WER reduction on average for clean and noisy speech, while fine-tuning only 50% of the encoder's parameters.

Zhengyang Li

Рекомендации по теме

333_Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular AV-ASR

333_Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular AV-ASR

ReFT: Representation Finetuning for Language Models -- Aryaman Arora & Zhengxuan (Zen) Wu

Trying transition video for the first time 💙😂 || #transformation #transition #shorts #viralvideo...

Efficient AI: From supercomputers to smartphones

Nati Srebro - Theoretical Perspectives on Deep Learning

Train a PyTorch neural network step-by-step | PyTorch deep learning (Feb 2023)

SE-Radio Episode 333: Marian Petre and André van der Hoek on Software Design.mp3

Neil Gershenfeld: Self-Replicating Robots and the Future of Fabrication | Lex Fridman Podcast #380

AWS re:Invent 2018: [REPEAT 1] Continuous Integration Best Practices (DEV319-R1)

State of the Art Deep Learning Based Object Detection in 2D

From skeptic to convert: Identifying vertebrae fractures by convolutional neural networks

Language and Entropy (Information Theory in Language)

ISCV2020 Session: Artificial Intelligence for Smart Healthcare (2020-06-09)

Zenoh 1.0.0 Unveiled - Episode 3

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

Training YOLOv8 with KerasCV on Custom Traffic Light Dataset

23734 SCENIC An Open Source Probabilistic Programming System for Data Generation

Higher education in Ukraine: past overview, present state and future perspectives

Accelerating Advanced AI & Deep Learning Workloads with Liquid Cooling Optimized for Green DC

DeepSeek-V3 Technical Report Walkthrough

Webinar #23 Julia: a fast, friendly, and powerful language for data science

SophIA Master Class 2020 - Arm Machine Learning frameworks and tools

AWS re:Invent 2019: AWS infrastructure for large-scale training at Facebook AI (CMP304-R1)

Webinar: NAT Principles and Common Maintenance Methods