Build Speech Recognition for any Language with 🤗 Transformers - Finetune XLSR-Wav2Vec2 (Hindi)

Показать описание

This Video Tutorial explains step-by-step guide of the Colab Notebook Hugging Face Notebook has put together to Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers. ASR - Automatic Speech Recognition. We demo with Hindi dataset from Mozilla Common Voice.

1littlecoder

Рекомендации по теме

Комментарии

Thank you so much for this video, it makes learning and testing easy for many languages!!

ajitkumar

Thanks a lot for this.
Some questions since I am moving from Kaldi to here.
1- How to load custom folder instead of Common_voice for fine-tuning (80 hours of custom data available with audio and transcription) ?
2- How to load and test custom file : custom audio file for evaluation ?

yasirahmedpirkani

I watched the whole video just to see what your explanation about the last part is and you just skipped it. The part in which the original dataset is loaded again to have test sample. There dataset is going to be downloaded again which in my case is not possible due to the storage capacity. Can you please answer what I should do so as to give wav file to the model and see the prediction of it? A wav file from google drive

rezamarefat

What verision of colab have you used here? I tried doing this on the T4 GPU and also TPU v2 free version and I am getting memory and resources exhausted errors and also the process is too slow. Could you help me out on deciding which version of colab to use, the GPU resource that I must be using.

nibeelyunus

I tried to use it fo Uzbek language but there is no supported Uzbek laguage. But UZbek speech dataset appears on mozilla Commonvoice. please advise

zohirjonsharipov

Hi, I am trying to test the performance of wav2vec2 on librispeech which should be straight-forward but the model is giving inconsistent results after fine-tuning. Also, the training loss is decreasing but WER is always 1 despite of multiple epochs. Also, while preparing data for english asr, is it recommended to keep everything in upper or lower case, does it have any impact on the fine-tuning results because while applying pretrained model on an audio, it is giving the output in the uppercase. Kindly give your inputs.

gauravgund

ModuleNotFoundError: No module named 'datasets.tasks' how to fix this error?

SadatHossain

Hi, am tryig to do with my own dataset with finetuneing of above model but I was struck at local data as json file where as in demo video of above they loading from import data sets.

nareshsandrugu

Hi, Could you tell me how to train with custom data, I have the audio files and their corresponding transcript.

kamek

Does it support Urdu Pakistan 🇵🇰 language?

EhsanIrshad

Hi, Can we train the model to understand multiple languages in a single audio file, for eg Hindi and English both?

AklankJain

Excuse me, can someone show me the final structure of the dataset?

armandorodarterodriguez

Thanks for this awesome tutorial. I have a question regarding training and testing (own dataset). I have trained huggingface model on my own dataset. I am facing problems for testing this model over my dataset. It leads to wrong output. why so? Do you have any solution for that? Any lead would be appreciated.

chaitanyamalpure

I am interested in local languages so how can I bring it to live

thomasonen

Can we do it for any type of language?

thangellakumar

im getting ModuleNotFoundError: No module named 'datasets.tasks' error at 7:57

WeeeAffandi

Hello how can I train my model to translate between Luo an Luganda

thomasonen

Can this model be exported and uded for on device inference using tensorflow mobile or pytorch mobile. ive been trying to deploy this on mobile devices ... any help would be highlt appreciated.

stephennfernandes

Can we train this for like 37000 audio files each having 3 to 5 second length?

piyushjain

Its giving error while downloading the dataset

sushmabillakanti

Build Speech Recognition for any Language with 🤗 Transformers - Finetune XLSR-Wav2Vec2 (Hindi)

I Built a Personal Speech Recognition System for my AI Assistant

Build your own real-time voice command recognition model with TensorFlow

Build a Speech Recognition System on a Raspberry Pi

Creating a Speech to Text Program with Python

Python Speech Recognition Tutorial – Full Course for Beginners

Speech Recognition in Python

Speech Recognition with Google Speech API on Raspberry Pi

Real-time Speech Recognition in 15 minutes with AssemblyAI

Speech Recognition in Unity [Tutorial]

How to use #Vosk -- the Offline Speech Recognition Library for Python

Build Speech Recognition for any Language with 🤗 Transformers - Finetune XLSR-Wav2Vec2 (Hindi)

Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]

Build a VOICE RECOGNITION app with Commands using Vue JS | Speech Recognition, AI

Easy Speech Recognition - Using Python

Make your own VOICE ASSISTANT in 30 lines of Python code.

Speech Recognition App Using Vanilla JavaScript

How Does Speech Recognition Work? Learn about Speech to Text, Voice Recognition and Speech Synthesis

Train your custom Speech Recognition Model with Hugging Face models

OpenAI Whisper - MultiLingual AI Speech Recognition Live App Tutorial

speech recognition using deeplearning | speech to text using python ,deeplearning 2022-23 tutorial

MIT 6.S191: Automatic Speech Recognition

How to generate speech from text in Python

Automatic Speech Recognition in 4 Lines of Python code with HuggingFace

Best FREE Speech to Text AI - Whisper AI