Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS

Показать описание

VITS Multispeaker English Training and Fine Tuning Notebook:

VITS Alternate Language Training and Fine Tuning Notebook:

YourTTS Training and Fine Tuning notebook:

Updated YourTTS and VITS multi-speaker English-language notebooks. New notebook is for training a VITS model with languages other than English.

In this one I take a look at alternate language training a VITS model using Coqui TTS on Google Colab. I trained a Spanish-speaking model on mostly-blind sample data. I don't speak Spanish, so I can't evaluate this, but it started sounding pretty good for what it was.

Then I review some of the change/differences in the multispeaker VITS notebook and YourTTS notebook

Other videos:

RTFM:

Рекомендации по теме

Комментарии

Woah! It's really really useful for Spanish training. Thank you!

blakusp

on the YourTTS paper, they train for 140K steps then fine tune for 50k steps with SCL enabled. not sure if you are doing this also.

making multi language models on YourTTS is probably the only thing you are missing and you are guaranteed to have every person using Coqui on you since the documentation is rather lacking to say the least.

even with lots of technical knowledge it was still a struggle setting this up before i found your notebooks.
Seriously, thank you for your efforts.

DestinyHax_YT

Hi, very nice video, does anyone know if there is a version of YourTTS that works well in Spanish? The CoquiTTS model seems to only accept English, French and Portuguese.

javierdiez

hi,

if i am new in training model and what you showed is to complicated from where i need to start to be able to understand what are you describing in this video?

ŁukaszMadajczyk

Does anyone have issue with one of the last steps? (Run trainer) .. it keeps giving me error: TypeError: object of type 'NoneType' has no len() .. I'm running single speaker in czech language. I've set up everything for czech (cs), but this step will not work if I try anything.

TheRonoxcz

Hello. I am trying to create a TTS with Japanese voice, referring to your wonderful video. I've heard a lot about RVC, but I don't know much about VITS. Is it possible to make TTS with Japanese voice using the method shown in the video? (I don't even know what pre-trained model means.). Thanks!

Jeaho

HOW CAN I SOLVE THIS ERROR WHEN INSTALLING IN SPANISH
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.0 requires numpy<1.24, >=1.22, but you have numpy 1.21.6 which is incompatible.
tensorflow 2.12.0 requires protobuf!=4.21.0, !=4.21.1, !=4.21.2, !=4.21.3, !=4.21.4, !=4.21.5, <5.0.0dev, >=3.20.3, but you have protobuf 3.19.6 which is incompatible.
panel 0.14.4 requires bokeh<2.5.0, >=2.4.0, but you have bokeh 1.4.0 which is incompatible.

jeilyamv

Could you please make a video for tamil text to speech with my own voice.

cready

how can i fix the following error

@nano
ModuleNotFoundError Traceback (most recent call last)
in <cell line: 1>()
----> 1 from transformers import WhisperProcessor,
2 options = dict(language=whisper_lang, beam_size=5, best_of=5)
3 transcribe_options = dict(task="transcribe", **options)
4

ModuleNotFoundError: No module named 'transformers'

tiemposrevelados

what about tex tokenizer? shouldnt it be seperate for different language?

pranilpatil

Hello, thank you for your video. I need help... I used the alternate language training notebook and only edited the dataset formatter (ljspeech) and the phoneme language (Bulgarian). When i try to synthesize a model i get an error. I did not run the processing options because my dataset is processed and i did not run tensorflow.

tts --text "Първите обитатели на териториите са били хомо сапиенс." \
--model_path \
--config_path \
--out_path output.wav

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/tts", line 11, in <module>
load_entry_point('TTS', 'console_scripts', 'tts')()
File "/Users/dennis/Desktop/AI/TTS/TTS/bin/synthesize.py", line 439, in main
reference_speaker_name=args.reference_speaker_idx,
File "/Users/dennis/Desktop/AI/TTS/TTS/utils/synthesizer.py", line 384, in tts
language_id=language_id,
File "/Users/dennis/Desktop/AI/TTS/TTS/tts/utils/synthesis.py", line 220, in synthesis
language_id=language_id,
File "/Users/dennis/Desktop/AI/TTS/TTS/tts/utils/synthesis.py", line 58, in run_model_torch
"language_ids": language_id,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/dennis/Desktop/AI/TTS/TTS/tts/models/vits.py", line 1161, in inference
o = self.waveform_decoder((z * y_mask)[:, :, : self.max_inference_len], g=g)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/dennis/Desktop/AI/TTS/TTS/vocoder/models/hifigan_generator.py", line 250, in forward
o = o + self.cond_layer(g)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 310, in _conv_forward
self.padding, self.dilation, self.groups)
TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
* (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)
* (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

onenoone

Hi, do tou think that is possible to use this tutorial for fine tune in Latin American Spanish leanguage? Thanks

paulaortegariera

Hello all, just want to ask if someone wants to train the language on regional language such as telugu, bengali and specially hindi than where to get the pretrained model file weights.

prateekkumarsingh

hello, my english is very poor so i don't understand much, where do we enter the audio files, I would be glad if you make a flashy video

okru

have you tried finetuning using the whole VCTK dataset + new speaker?

jazza

how would i go about increasing the khz (higher then 16k) since it sounds i would say "to bad"

julin

How can I solve error 3 from trainer import Trainer, TrainerArgs
4
----> 5 from import BaseDatasetConfig
6 from TTS.tts.configs.vits_config import VitsConfig
7 from TTS.tts.datasets import load_tts_samples

ModuleNotFoundError: No module named 'TTS.tts'

jeilyamv

Make a practical video for hindi language voice cloning for multi speakers

Can you clear me one thing did we have to repeat the whole process for multi speaker or what we have do not getting it correctly

I hope you will make me understand

I am glad for this video

Thanks dear🥰🥰

shailendrarathore

HELLO DEAR NANONOMAD PLEASE MAKE A PRACTICAL VIDEO OF VOICE CLONING HINDI LANGUAGE OF SPECIFIC PERSON

GETTING SOME ISSUES

NEARLY TRIED TO DO WITH 38 TIMES BUT GETTING NO OUTPUT

AND LET ME KNOW HOW TO FINETUNE THE FIRST DIRTY OUTPUT SO THAT ITS SOUNDS SO NATURAL

WITH REGARDS TO MR. NANOMOMAD

PLEASE HELP AND GIVE A TRY 🙏🙏

shailendrarathore

Hi, We are trying to fine-tune with Hindi audio. Each epoch takes approximately 2.5 hrs. Can you share the machine configuration used to create this, and the time it took to fine-tune the model. Thanks..!

AvinashTulasi

Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS

Train or Fine Tune VITS on (theoretically) Any Language | Train Multi-Speaker Model | Train YourTTS

Train a VITS Speech Model using Coqui TTS | Updated Script and Audio Processing Tools

Training or Fine Tuning a Hindi Language VITS TTS Voice Model with Coqui TTS on Google Colab

Vision Transformers (ViT) Explained + Fine-tuning in Python

VITS TTS Fine Tuning Models Tutorial with Alex Jones AI #ai #vits #coqui

Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab or Linux

Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab

So-Vits-SVC: Local Training Tutorial (How to make your own model)

Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

Fine-tune Text-to-Speech Models for any Language: Introduction to TTS

Voice Cloning Made Simple Learn to Use Tacotron2 for TTS Voice Models

Image Classification Using Vision Transformer | ViTs

Text to Speech Fine-tuning Tutorial

(Tutorial) (Read Description First) Tutorial about cloning a character's voice by VITS.

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

BE544 Lecture 14 - Pretrained Vision Transformers (ViTs) using Hugging Face

Complete Guide: AI Voice Training with So-Vits-SVC - Part 1: Google Collab

VITS TTS Model Tutorial with David Attenborough AI #artificialintelligence #ai #tts #coqui

Vision Transformer for Image Classification

A Mechanistic Analysis of Same-Different Relations in ViTs - Michael Lepori and Alexa Tartaglini

[1hr Talk] Intro to Large Language Models

Fine-Tuning Whisper to Improve Automatic Transcripts

BERT Neural Network - EXPLAINED!