OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

Показать описание

High level overview of what's happening with OpenAI Whisper Speaker Diarization:

Using Open AI's Whisper model to seperate audio into segments and generate transcripts.
Then generating speaker embeddings for each segments.
Then using agglomerative clustering on the embeddings to identify the speaker for each segment.

Speaker Identification or Speaker Labelling is very important for Podcast Transcription or Conversations Audio Transcription. This code helps you do that.

1littlecoder

Рекомендации по теме

Комментарии

Thanks so much for making this video and highlighting my code! Really cool to see it's useful to other peopl!

DwarkeshPatel

As always, delivering the goods! Thanks 1littlecoder!

estrangeiroemtodaparte

I was working on a model to do this exact thing, as we speak. Thanks for the resource, this will save me lots of time

stebe

Hi Everyone, Thanks 1littlecoder and Dwarkesh, this is fantastic, I managed to get it working and it is helping me immensely and I am learning a lot. I am struggling with Google as I always end up with 0 compute units and that causes all sorts of issues and I am unable to complete the transcriptions (i have quite large files I am processing, several 1 hr coaching sessions). Does AWS have a better option going? And the next question would be how easy would it be porting this to an AWS linux environment? if that is an option

IWLTFT

Amazing videos!! Keep going!. I have a request though. Could you please publish a video for customizing GPT-J-6B on colab using 8bit version.

mjaym

It is interesting, although, i think it would be way better to autodetect how many speakers are there, and then start the transcription.

rrrila

Thank you so much, by any chance, do you think there could be a method to make it do all of that in real time, during a call let's say.

Any ideas of where could I start would be very helpful ❤❤

SustainaBIT

I'm running it locally in Jupyter notebook but I can't seem to find an offline model PreTrainedSpeakerEmbedding

MixwellSidechains

I almost did it manually
1.Created rttm file using pyannote.
2.Slice full length audio using rttm file in and out for each.
3.Run thru wisper for transcription.

It was like 5 times slower.
I was thinking hard how to do other way around. First generate full transcript and then separate segments.
Some how i saw your video and am impressed and the AgglomerativeClustering at he end blowed my mind.

Thanks for sharing knowledge.

kmanjunath

Thank you for sharing your knowledge!
Everything works fine, but an error started appearing:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavior is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires openai, which is not installed.
How to fix?

IgorGeraskin

I m trying to upload a wav file of 5MB and I m getting "RangeError: Maximum call stack size exceeded" .. is this only for tiny file sizes?

klarinooo

i started to love it when you used bruce wayne's clip

udcqlql

hello sir...i have small doubt...if we have no of speaker more than "2" ..how could the parameter "num_speakers" be vary..

zwxhboz

None of the ones I've played with cope particularly well with more complicated situations. For instance where one person interrupts another, or if there are three people or more. They can all cope with two very clearly different speakers, but then I figure I could do that with old school techniques like simply averaging the frequency. It's weird because the text to speech itself is enormously clever, it's just surprising that the AI can't distinguish voices well.

geoffphillips

The colab notebook is not accessible. Can you share the new link.

gaurav

I'm not much of a dev myself, but it seems like it might be simple to add a total time spoken for each speaker. I would love to be able to analyze podcasts in order to understand how much time the host is speaking relative to the guest. In fact it would be very cool if someone built an app that would remove one of the speakers from a conversation and create a separate audio file consisting of only what the remaining speaker/s said.

JohnHumphrey

It doesn't work well (detects language as Malay, also, does not offer custom names for speakers) - anyone has a better working solution?

frosti

What if new speaker enters in between then number of speakers will become? old + new or old?

kmanjunath

on the last cell I get an error for UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-6: ordinal not in range(128) any ideas whats going wrong and how do I fix it?

jordandunn

Anyone tried this recently?
Code no longer works. Looks to me some dependencies have been upgraded.

DestroyaDestroya

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

The Secret to Instant Meeting Summaries: Whisper Diarization Revealed

Multi Speaker Transcription with Speaker IDs with Local Whisper

OpenAI Whisper Speaker Diarization | WhisperX | Pyannote | HuggingFace

Add Speaker Diarization to OpenAI's Whisper Speech to Text

transcription and speaker identification OpenAI-Whisper and Pyannote [Python]

speaker diarization | speaker diarization using Whisper | speaker diarization labelling

Building an Audio Transcription App with OpenAI Whisper and Streamlit

OpenAI's Whisper Realtime Speech Recognition Chatbot Test

Python AI Audio to Text Transcription: How to identify speakers openai Whisper api

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

Use OpenAI Whisper For FREE | Best Speech to Text Model

WhisperX - Word-level Timestamps with Whisper - Subtitles Transcription

Pyannote Speaker Diarization

Make OpenAI Whisper 2-4x Faster in Python in 100 Seconds

Best Voice Transcription AI is now the FASTEST - WHISPER JAX!

How to Install & Use Whisper AI Voice to Text

Speaker detection with Whisper AI: clip 1 easy

Best FREE Speech to Text AI - Whisper AI

Transcribe Audio Files with OpenAI Whisper

Nemo Speaker Diarization + Whisper Transcript Tutorial | NVIDA | Python | Training Nemo Model

Realtime Speech To Text Using OpenAI Whisper

Fast Audio Transcription with Whisper CPP + ChatGPT #artificialintelligence #whisper #chatgpt

Build Speech Recognition & Real-Time Transcription Web App using Whisper & Gradio