filmov
tv
OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

Показать описание
❤️ Become The AI Epiphany Patreon ❤️
👨👩👧👦 Join our Discord community 👨👩👧👦
In this video I cover Whisper, an ASR system from OpenAI's "Robust Speech Recognition via Large-Scale Weak Supervision" paper.
Trained on a huge multi-lingual, multi-task weakly supervised dataset it achieves a very high effective robustness and accuracy closing the gap with the human baseline using only an off-the-shelf transformer.
I walk you through both the paper as well as the actual code. Let me know whether the code part helped!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00:00 Intro
00:02:05 Paper overview
00:07:30 Collecting a large scale weakly supervised dataset
00:13:55 Evaluation metric issues (WER)
00:16:05 Effective robustness
00:18:40 Scaling laws in progress
00:26:30 Decoding is hacky
00:28:30 Code walk-through
00:30:25 Model architecture (diagram vs code)
00:33:30 Transcription task
00:34:10 Loading the audio, mel spectrograms
00:37:50 Language detection
00:45:00 Transcription task continued
00:47:35 Suppressing token logits
00:52:00 Voice activity detection
00:53:35 Decoding and heuristics
01:01:56 Outro
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!
Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#whisper #openai #asr
👨👩👧👦 Join our Discord community 👨👩👧👦
In this video I cover Whisper, an ASR system from OpenAI's "Robust Speech Recognition via Large-Scale Weak Supervision" paper.
Trained on a huge multi-lingual, multi-task weakly supervised dataset it achieves a very high effective robustness and accuracy closing the gap with the human baseline using only an off-the-shelf transformer.
I walk you through both the paper as well as the actual code. Let me know whether the code part helped!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00:00 Intro
00:02:05 Paper overview
00:07:30 Collecting a large scale weakly supervised dataset
00:13:55 Evaluation metric issues (WER)
00:16:05 Effective robustness
00:18:40 Scaling laws in progress
00:26:30 Decoding is hacky
00:28:30 Code walk-through
00:30:25 Model architecture (diagram vs code)
00:33:30 Transcription task
00:34:10 Loading the audio, mel spectrograms
00:37:50 Language detection
00:45:00 Transcription task continued
00:47:35 Suppressing token logits
00:52:00 Voice activity detection
00:53:35 Decoding and heuristics
01:01:56 Outro
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!
Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#whisper #openai #asr
Комментарии