OpenAI Whisper Demo: Convert Speech to Text in Python

preview_player
Показать описание
In this video tutorial we show how to quickly convert any audio into text using OpenAI's Whisper - a free open source language audio to text library that works in many different languages!

My other videos:

#python #pandas #datascience
Рекомендации по теме
Комментарии
Автор

Thanks for making this helpful video. I really enjoyed watching it.
Whisper is a huge step forward to local speech recognition.

ThorstenMueller
Автор

Great video, thanks Rob! ... I tried the model in German a few times and it worked quite well but not without errors. One time I took an audio example from Hermann Hesse's wonderful book: Narcissus and Goldmund and the model translated 'Narciss' (German for Narcissus) with 'Nazi'. ... so, I will still read and correct the future results before sending them to my boss. ;-)

IntenseRouge
Автор

Such an awesome video. I've been looking for a little while now and this is exactly what i'm looking for. Additionally, the way you presented everything was super quick and easy to understand (which i appreciate since I'm currently running a fever lol). Either way, you're a life saver, and I want to thank you so much for all your hard work.

Ethernick_V
Автор

Really nice explanation and demonstration, You sir have a new subscriber (me)

AhsanNawazish
Автор

Thanks for this valuable video. You deserve more views and likes

davidliu
Автор

Hello all ! nice first impression! I ran a 8mins mp3 file and it worked perfectly. I am pretty surprised. q=)

Chris_zacas
Автор

Seriously, such an awesome project!!!

bujin
Автор

More content like this please! and thank you for the tutorial

AlejandroGonzalez-pzhl
Автор

Hey Medallion! What’s the best way/library to perform text to speech, speech to text and speech to speech translations between languages. I’m from India, so a model that’s capable of a lot of indigenous languages is necessary. And if possible could you make a video about this?

reubenthomas
Автор

Da Vinci Resolve needs to use this to generate subtitles 👌

Sachin-at
Автор

Hey guys please can anyone help me with this issue. I am trying to run whisper on my machine and I am getting this error in cmd. UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead").
I use a windows 10 with gpu RTX2060. Also it seems it runs on my cpu instead of NVIDIA GPU. I created a python virtual environment and pip installed whisper in that virtual environment just for more details.

dimorischinyui
Автор

Thx for your kind detail explanation!. Could you explain to me how the improvement of a Whisper model works?
Do I need text or audio or both?? I would like to improve for the recognition of new words in the specific field I targeted.

leecloud
Автор

Cool video! I want to get this working for live speech-to-text since it is fast enough to run real-time but it seems like since you can't pass in continuous audio you would run into issues where the model would not have the previous output as context and could easily get cut off mid word. Any ideas for how to tackle that issue?

registeel
Автор

Hi Medallion, Thanks for the video.
I've followed both of your processes, but when I run I get a FileNotFoundError: [WinError 2] The system cannot find the file specified. I've got my test file in the same folder as my main.py. Any ideas what I need to do to get it to work?

geoffreybell
Автор

Great video 👍, just wanted to know in detail how to use this, and i now seen u r video, i 100% understanded. Btw which software or the thing..
In which you r writing the code ?

spartan
Автор

Thanks for providing details. Does it support live streaming audio? Instead of using pre-recorded audio clip can it transcribe the live speech

hareeshkumar
Автор

Do you have any advice for how to fix the 'ModuleNotFoundError: no module named 'torch._C'? I looks around the internet for answers but there's none that works, i even tried different python versions.

theHaloFM
Автор

Noob question, but does this work offline, or is it an API call to OpenAI?

cbara
Автор

Hi Rob, thank you for taking the time to share out of the wealth of your knowledge. I tried running the model, and it keeps telling me Numpy not available. I used Pip Install numpy, and I realized that numpy is available. Please, what could the problem be? Thank you. I want to use this for qualitative research. Thank you once again, and I hope to hear from you.

nelsonkayode
Автор

Can you give it more than 30 seconds of audio or are you forced to break up the source file?

bryede