OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

Показать описание

❤️ Become The AI Epiphany Patreon ❤️

👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦

In this video I cover Whisper, an ASR system from OpenAI's "Robust Speech Recognition via Large-Scale Weak Supervision" paper.

Trained on a huge multi-lingual, multi-task weakly supervised dataset it achieves a very high effective robustness and accuracy closing the gap with the human baseline using only an off-the-shelf transformer.

I walk you through both the paper as well as the actual code. Let me know whether the code part helped!

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00:00 Intro
00:02:05 Paper overview
00:07:30 Collecting a large scale weakly supervised dataset
00:13:55 Evaluation metric issues (WER)
00:16:05 Effective robustness
00:18:40 Scaling laws in progress
00:26:30 Decoding is hacky
00:28:30 Code walk-through
00:30:25 Model architecture (diagram vs code)
00:33:30 Transcription task
00:34:10 Loading the audio, mel spectrograms
00:37:50 Language detection
00:45:00 Transcription task continued
00:47:35 Suppressing token logits
00:52:00 Voice activity detection
00:53:35 Decoding and heuristics
01:01:56 Outro

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️

If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#whisper #openai #asr

Рекомендации по теме

Комментарии

Let me know whether the code part helped! :) Is it adding any value for you guys? Or am I just rambling and it's too hard to follow unless you play with the code yourself? Would really appreciate some feedback!

TheAIEpiphany

I just found this channel and I’m SO THANKFUL for a great walkthrough and explanation. It’s super fun. This is gold!!! Thanks Aleksa!

devhau

Thanks for walking through Whisper code together, enjoyed the journey!

mariatrofimova

!! What a good video. I was like searching for something like this. Where in even a noob like me can understand the entire paper because you took through it step by step!
I knew this was going to be a great video when you stopped to explain log-mel spectrum as well!
Thanks Aleksa

pratikkhedikar

Thanks Aleksa! Really appreciate the effort you put into this videos. Quality content, keep it up.

Spockleblupit

This is super cool man! Thanks for diving deep into it

alexgilka

Thank you so much for doing these videos. You helped me so so so so much.

huonglarne

Very informative and authoritative, thank you!

FreeSubtitlesAI

@Aleksa Gordic, Thanks for sharing this valuable information. Apart from AI would look to see how you are using VS code so effectively to move between the code and debug it. Would really appreciate it if you could provide more information on the same on video.

vinayakbaddi

Hi Aleksa! Great video! I just wanted to know what would the loss function be for the models? Would it be something like cross-entropy? Because the model predicts tokens..

asceznyk

Welsh an outlier. Never would have guessed. Anyway, gotta go, heading out to this afternoon.

petercowling

@TheAIEpiphany, how do you see the effect of "best_of" parameter in the quality of the transcription? Any insight would be helpful. Thanks

goryeodynasti

Sir, I have read your roadmap to Reinforcement Learning...I wanna do research in RL...1)Should i still follow your roadmap ? 2) Do i need to know the whole maths derivation behind Supervised Unsupervised and Deep Learning Algorithm 3) How can i start doing research in RL in undergraduate in an non research institute?

convolutionalnn

I have watched your video and it was great! But I'm not sure whether the translation and transcription tasks share the same decode parameters.

amilia

I wonder if we can use the attention map (of how much each audio token contributes to the prediction of each transcript token) to back out timestamps instead?

ChuanChihChou

Hey really nice video. Can we fine tune whisper model for our dataset. If yes can you show us how

tahercoolguy

Would be helpful if you could put these models in history a bit. I’m not as familiar with how things were done in the past vs. today SOTA.

FinnBrownc

can the model be ram locally? how much computing to run this model for inference

xXMaDGaMeR

Can someone explain how are embeddings learnt?

kshitizkhandelwal

is it possible to find the timestamps of each transcribed word? Great work!

kerenstarobinski

OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

OpenAI Whisper: Robust Speech Recognition via Large Scale Weak Supervision

Whisper Paper Explained: Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI's Whisper Model Explained

[Olewave's Review] OpenAI's Whisper ASR: Robust Speech Recognition via Large-Scale Weak Su...

OpenAI’s Whisper Learned 680,000 Hours Of Speech!

Whisper: Robust Speech Recognition [It-Jim Paper Review]

Speech Recognition with OpenAI Whisper

NLP Deep Dive, Paper Reading: Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)

OpenAI's Whisper model - Explanation and demo

Open AI’s Whisper is Amazing!

OpenAI Whisper Demo: Convert Speech to Text in Python

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

Whisper - OpenAI Releases Open Source Speech Recognition System

OpenAI Whisper: Convert Speech To Text | OpenAI Whisper Explained in 8 Minutes | Simplilearn

Use OpenAI Whisper For FREE | Best Speech to Text Model

#OpenAI Releases #Whisper - An Automatic Speech Recognition System (ASR)

OpenAI's Whisper Model Explained: What is it and what can it do?

SDS 620: OpenAI Whisper: General-Purpose Speech Recognition — with @JonKrohnLearns

OpenAI Releases 1.6 Billion Parameter Multilingual Speech Recognition AI Whisper

OpenAI Whisper - MultiLingual AI Speech Recognition Live App Tutorial

OpenAI Whisper - Free Audio to Text AI

Understanding Speech Recognition using OpenAI's Whisper Model

OpenAI Whisper: Incredible Automatic Speech to Text!