Use OpenAI Whisper For FREE | Best Speech to Text Model

Показать описание

In this video, I will show you how to run the whisper v3 model on Google Colab Notebook. Enjoy :)

Want to Follow:

Want to Support:

Need Help?

LINKS:

All Interesting Videos:

Рекомендации по теме

Комментарии

Want to connect?
|🔴 Join Patreon: Patreon.com/PromptEngineering

engineerprompt

Thanks for the video ! Been using this model for a long while to do translation+transcription of lectures (one and a half hours), mostly it works like a charm. I dont know about large-v3 but large-v2 would sometimes repeat and loop one sentence about half of the transcription.
So it needs optimization (some solutions clean the audio before whisper).

Nihilvs

in this we are downloading the model, or using the interface API ??? actualy, I am new to it and confused. If it is model then it will be best for me to host it in the server.

ACse-vy

🎯 Key Takeaways for quick navigation:

00:00 🎙️ *Overview of Whisper V3 Model*
- Whisper V3 is OpenAI's latest speech-to-text model.
- Five configurations available: tiny, base, small, medium, and large V3.
- Memory requirements vary from 1 GB to 10 GB VRAM.
01:25 🔄 *Comparison: Whisper V2 vs. V3*
- V3 generally performs better with lower error rates than V2.
- There are specific cases where V2 outperforms V3, demonstrated later.
- Important to consider performance metrics when choosing between V2 and V3.
03:02 ⚙️ *Setting Up Whisper V3 in Google Colab*
- Installation of necessary packages: Transformer, Accelerator, and Dataset.
- GPU availability check and configuration for optimal performance.
- Loading the Whisper V3 model, setting processor, and creating the pipeline.
05:27 🎤 *Speech-to-Text Transcription Process*
- Creating a pipeline for automatic speech recognition using the Whisper V3 model.
- Uploading and transcribing an audio file in a Google Colab notebook.
- Additional options such as specifying timestamps during transcription.
07:45 🌐 *Language Recognition and Translation*
- V2 may be preferable when language is unknown, as it can automatically recognize it.
- Whisper supports direct translation from one language to another.
- Highlighting the importance of specifying the language in V3 if known.
09:22 ⚡ *Flash Attention and Distal Whisper*
- Enabling Flash Attention for improved performance if the GPU supports it.
- Introduction to Distal Whisper, a smaller, faster version of Whisper.
- Demonstrating how to use Distal Whisper Medium English in code.
11:54 🌐 *Future Applications and Closing*
- Exploring potential applications, like enabling speech communication with documents.
- Encouraging viewers to explore and experiment with the Whisper model.
- Expressing the usefulness and versatility of the Whisper model in various applications.

Made with HARPA AI

RameshBaburbabu

People want to run it locally for privacy not route it to Google

trilogen

VERY interesting, I would love to know how to run it locally, I mean with a UI in a local computer, not in a google notebook, it will be very VERY useful to transcript a video, translate later with another tool or model and then generate subtitles or generate an audio to translate the video, everything locally :)

JuanGea-krli

srry but about transcribing from url like youtube?

WEKINBAD

i want to use prompt for little personalization, how to do it ?

binayaktv

I am using V2 on my Nvidia 1080 GPU. The perofrmance difference between the base model and the large model is very small. I tried multiple sources, tones of voice, noise, etc. Base version is really fast, so I recommend that one. Even V2 is really perfect for transcribing speech to text.

ekstrajohn

I'm a "scopist" and I need to edit transcripts with different speakers, I take it this does not differentiate speakers?

ericneeds

Hey. I have been trying to reduce the length of the subtitles as the characters generated by whisper can be overwhelming, ranging between 12 - 18 words in a single caption. I am using google colab and so far there's no success. Here are the commands i have used:

!whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40
!whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5

It works completely fine with the following command but with large number of words
!whisper "FILE NAME" --model medium

Could you please help.

ZaazZ-su

Can you do a video on multi-speaker identification and transcription using whisper please.

Nawaz-lbeq

Hi, can anyone help me, I'm having problem in following the tutorial, I encountered some error in
pipe = pipeline("automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=15,
batch_size=16,
return_timestams=True,
torch_dtype=torch_dtype,
device=device)

it says that:

TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 pipe = pipeline("automatic-speech-recognition",
2 model=model,
3 tokenizer=processor.tokenizer,
4 feature_extractor=processor.feature_extractor,
5 max_new_tokens=128,

matangbaka

Hey. I have been trying to reduce the length of the subtitles but havent been successful. I am using google colab. so here is the command

!whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40 (didnt succeed)
!whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5 (no success)
help needed please

ZaazZ-su

Greetings, can this be used for whisperX too? i tried it with v2 model it used to work maybe

bmqww

Is there still NO practical application for real-time transcribing (+/- translating the text) that is readily available on android ?
I think I heard about one or two projects that wanted to do that but still nothing concrete more than a full year after this incredible piece of technology appeared
Am I missing something ? Is whisper incompatible with android ? Is there no way to apply whisper to continuous live audio recording ?
Has nobody managed to do it ??

WillyFlowerz

is it just me, or is this not a big jump in improvement? at least for me
i wanted
1. speaker recogition / diarization
2. higher accuracy rate in mandarin

hope they do the first asap, and the second will get better over time i hope. Azure already has a speech-to-text service that includes speaker recognition and is quite good. I wonder if could affect how they prioritise this important feature

farahabdulahi

Helpful video! Now I can run code locally. Thanks

rccarsxdr

I like the idea of chatting with documents through speech

thunderwh

⚠️ I need a good text to speech for free. It doesn't help if you can talk to a model, but it can't talk back. So, what to do ? Any good, free text to speech ?? 😮

DihelsonMendonca

Use OpenAI Whisper For FREE | Best Speech to Text Model

Use OpenAI Whisper For FREE | Best Speech to Text Model

FREE & OFFLINE Audio to Text | Whisper: Install Guide | OpenAI Whisper | ASR

How to Install & Use Whisper AI Voice to Text

Transcribe Audio Files with OpenAI Whisper

OpenAI Whisper - Free Audio to Text AI

Best FREE Speech to Text AI - Whisper AI

Run OpenAI’s Whisper on Cloudflare for Free using Workers AI [Speech to Text]

Transcribe Audio Files for Free Using OpenAI Whisper

How to get your OpenAI / Chatgpt API Key in 5 minutes! | FULL TUTORIAL

Transcribe Video For Free With OpenAI Whisper Using Python | Tutorial For Beginners

OpenAI Whisper Demo: Convert Speech to Text in Python

How to Use OpenAI Whisper Without Coding (Speech to Text Open AI Tool)

Using OpenAI's Whisper Speech Recognition via Command-Line in Google Colab Free

Transcribe YouTube videos for free with OpenAI's Whisper

Use OpenAI Whisper on Windows PC

🔥 Free, Accurate, Offline Transcripts/Subtitles with Whisper AI from OpenAI (Mac)

TRANSCRIBE UNLIMITED AUDIO FOR FREE - Open AI Whisper on Google Colab - Ultimate Transcription Tool

OpenAI Whisper Tutorial + Audio to Text Translator Website Project 🔥

MacWhisper - The best macOS app using OpenAI Whisper

OpenAI whisper | Free Open Source Speech recognition Offline Python Demo

OpenAI Whisper - MultiLingual AI Speech Recognition Live App Tutorial

Automatically Transcribe and Subtitle Audio and Videos Fast (It's Free) - Powered by Open AI Wh...

Transcribe Audio Files in your Google Drive with OpenAI's Whisper

Bypass reCAPTCHA with the OpenAI Whisper Model [Python]