How to Use Hugging's Face Wav2Vec for Speech Recognition in Python

preview_player
Показать описание
Hi guys! Welcome to another video, in this video I'll be showing you how to download and use a pretrained model named Wav2Vec to do Speech Recognition, Wav2Vec is a state-of-the-art model for speech recognition, it uses a similar training strategy as word2vec to learn speech representations using unlabeled data and then fine-tune the model on a labeled data, it also uses a Transformer architecture, using the HuggingFace library called transformers you can use or fine-tune a variety of models, today we'll focus o Wav2Vec, since our goal is to have one of the best models available for speech recognition.
Рекомендации по теме
Комментарии
Автор

Thank you for this excellent video! Can you point me to information about how to build a new model? I am a linguist working on very low resource languages and want to get my PhD students to learn how to do ASR on the languages they study. Thanks!!

NathanWHill
Автор

Is there a scorer? That something that puts the logits into sentences, then gets logits for the sentences as opposed to the tokens? That way it would choose results that make the best sense. It's an architecture that's pretty common. Deep Speech does that, for example. And I think vosk does it under the surface. That's why it shows partials and then text.

dr.mikeybee
Автор

hi, i am working with a project that is about an especific context, with uncomom words. i want to get a base model, like this one and reforce the training with some especific data, it is doable?
ola, estou com um projeto de transcrição em um âmbito especifico, com palavras não comuns. Queria pegar um modelo base e reforçar o treino para algumas expressões especificas. É tranquilo de fazer isso?

manasomali
Автор

I had a file not found error with AudioSegment.from_file(data), idk why it happened at all but i fix it with AudioSegment.from_wav(data).

alejandrootiniano
Автор

how would you change the code if you had run it without an internet connection? And the model is downloaded on my local machine
Thanks

harshj
Автор

is it possible for this code to work without an internet connection?

harshj
Автор

hi its not working on my ubuntu 20.04
it gets stuck at "you can start speaking now"
can u help please?

Jay-pjwm
Автор

It is possible to get the timestamp of each word, where starts and ends?

titusfx
Автор

How can I train a new model with another language beginning from scratch like how can I fine tune the pre trained model too?

yohannesayana
Автор

another subject, there is an group of the channel in telegram? ou something similar... i think it would be a nice thing be able to discuss this topic in an codigo logo community.
outro assunto, existe um grupo do canal no telegram? ou algo parecido... acho que seria legal poder discutir esse assunto com a comunidade do codigo logo.

manasomali