Use OpenAI Whisper For FREE | Best Speech to Text Model

preview_player
Показать описание
In this video, I will show you how to run the whisper v3 model on Google Colab Notebook. Enjoy :)

Want to Follow:

Want to Support:

Need Help?

LINKS:

All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

Want to connect?
|🔴 Join Patreon: Patreon.com/PromptEngineering

engineerprompt
Автор

Thanks for the video ! Been using this model for a long while to do translation+transcription of lectures (one and a half hours), mostly it works like a charm. I dont know about large-v3 but large-v2 would sometimes repeat and loop one sentence about half of the transcription.
So it needs optimization (some solutions clean the audio before whisper).

Nihilvs
Автор

in this we are downloading the model, or using the interface API ??? actualy, I am new to it and confused. If it is model then it will be best for me to host it in the server.

ACse-vy
Автор

🎯 Key Takeaways for quick navigation:

00:00 🎙️ *Overview of Whisper V3 Model*
- Whisper V3 is OpenAI's latest speech-to-text model.
- Five configurations available: tiny, base, small, medium, and large V3.
- Memory requirements vary from 1 GB to 10 GB VRAM.
01:25 🔄 *Comparison: Whisper V2 vs. V3*
- V3 generally performs better with lower error rates than V2.
- There are specific cases where V2 outperforms V3, demonstrated later.
- Important to consider performance metrics when choosing between V2 and V3.
03:02 ⚙️ *Setting Up Whisper V3 in Google Colab*
- Installation of necessary packages: Transformer, Accelerator, and Dataset.
- GPU availability check and configuration for optimal performance.
- Loading the Whisper V3 model, setting processor, and creating the pipeline.
05:27 🎤 *Speech-to-Text Transcription Process*
- Creating a pipeline for automatic speech recognition using the Whisper V3 model.
- Uploading and transcribing an audio file in a Google Colab notebook.
- Additional options such as specifying timestamps during transcription.
07:45 🌐 *Language Recognition and Translation*
- V2 may be preferable when language is unknown, as it can automatically recognize it.
- Whisper supports direct translation from one language to another.
- Highlighting the importance of specifying the language in V3 if known.
09:22 ⚡ *Flash Attention and Distal Whisper*
- Enabling Flash Attention for improved performance if the GPU supports it.
- Introduction to Distal Whisper, a smaller, faster version of Whisper.
- Demonstrating how to use Distal Whisper Medium English in code.
11:54 🌐 *Future Applications and Closing*
- Exploring potential applications, like enabling speech communication with documents.
- Encouraging viewers to explore and experiment with the Whisper model.
- Expressing the usefulness and versatility of the Whisper model in various applications.

Made with HARPA AI

RameshBaburbabu
Автор

People want to run it locally for privacy not route it to Google

trilogen
Автор

VERY interesting, I would love to know how to run it locally, I mean with a UI in a local computer, not in a google notebook, it will be very VERY useful to transcript a video, translate later with another tool or model and then generate subtitles or generate an audio to translate the video, everything locally :)

JuanGea-krli
Автор

srry but about transcribing from url like youtube?

WEKINBAD
Автор

i want to use prompt for little personalization, how to do it ?

binayaktv
Автор

I am using V2 on my Nvidia 1080 GPU. The perofrmance difference between the base model and the large model is very small. I tried multiple sources, tones of voice, noise, etc. Base version is really fast, so I recommend that one. Even V2 is really perfect for transcribing speech to text.

ekstrajohn
Автор

I'm a "scopist" and I need to edit transcripts with different speakers, I take it this does not differentiate speakers?

ericneeds
Автор

Hey. I have been trying to reduce the length of the subtitles as the characters generated by whisper can be overwhelming, ranging between 12 - 18 words in a single caption. I am using google colab and so far there's no success. Here are the commands i have used:

!whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40
!whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5

It works completely fine with the following command but with large number of words
!whisper "FILE NAME" --model medium

Could you please help.

ZaazZ-su
Автор

Can you do a video on multi-speaker identification and transcription using whisper please.

Nawaz-lbeq
Автор

Hi, can anyone help me, I'm having problem in following the tutorial, I encountered some error in
pipe = pipeline("automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=15,
batch_size=16,
return_timestams=True,
torch_dtype=torch_dtype,
device=device)

it says that:

TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 pipe = pipeline("automatic-speech-recognition",
2 model=model,
3 tokenizer=processor.tokenizer,
4 feature_extractor=processor.feature_extractor,
5 max_new_tokens=128,

matangbaka
Автор

Hey. I have been trying to reduce the length of the subtitles but havent been successful. I am using google colab. so here is the command

!whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40 (didnt succeed)
!whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5 (no success)
help needed please

ZaazZ-su
Автор

Greetings, can this be used for whisperX too? i tried it with v2 model it used to work maybe

bmqww
Автор

Is there still NO practical application for real-time transcribing (+/- translating the text) that is readily available on android ?
I think I heard about one or two projects that wanted to do that but still nothing concrete more than a full year after this incredible piece of technology appeared
Am I missing something ? Is whisper incompatible with android ? Is there no way to apply whisper to continuous live audio recording ?
Has nobody managed to do it ??

WillyFlowerz
Автор

is it just me, or is this not a big jump in improvement? at least for me
i wanted
1. speaker recogition / diarization
2. higher accuracy rate in mandarin

hope they do the first asap, and the second will get better over time i hope. Azure already has a speech-to-text service that includes speaker recognition and is quite good. I wonder if could affect how they prioritise this important feature

farahabdulahi
Автор

Helpful video! Now I can run code locally. Thanks

rccarsxdr
Автор

I like the idea of chatting with documents through speech

thunderwh
Автор

⚠️ I need a good text to speech for free. It doesn't help if you can talk to a model, but it can't talk back. So, what to do ? Any good, free text to speech ?? 😮

DihelsonMendonca