Explaining Checkoints and Restore/Continue possibilities with Coqui TTS.

preview_player
Показать описание
Ever cancelled or stopped a training too early or wanted to switch config in a running training process?

If your continued training stops while synthesizing the test phrases, the following workaround might help (thanks to @tegeztheguy2719 for that tip):
---
---
Рекомендации по теме
Комментарии
Автор

From Spain, I thank you. I haven't achieved anything yet since I started with this recently, but your videos are helping me a lot to understand this project.

RafaelDc-pf
Автор

Very useful thank you! In case of fine tuning a VITS model with a new dataset can I take best model.phar instead of a numbered checkpoint like you do ?

caressembleadufake
Автор

Very helpful video. Thanks a lot for this invaluable content!

HenriqueLatorre
Автор

I'm using Colab for the training and with my nervousness of the system crashing so I've set up checkpoints every 250 steps. I've noticed that the trainer is always deleting all checkpoints except the latest one. Still learning here and not sure whether I should be keeping all the checkpoints? Is there a setting I'm missing to do this?

Colab is still faster than my quite highly spec'd laptop (about 2-2.5 times faster). So I'm sticking to remote processing.

In order not to lose files when Colab crashes, I'm saving everything to Google Drive, but notice that deleted files pile up in the bin (about one 5GB file every epoch, which soon adds up!) So a lot of headroom and frequent emptying of the bin!

Thanks for the guide and for all the other videos you've made :)

jonathantill
Автор

Thanks Thorsten <3.

Any advice on freezing model layers before fine-tuning? I was able to get solid results using that method on resnet50 when fine-tuning an image classifier. Is there a way to implement something similar on Coqui?

nossonweissman
Автор

unfortunately continuing doesn't work for me at the moment. If I start finetune via --restore_path or continue via --continue_path, the training fails at "Synthsizing Testsentence" with the error
File, line 92, in normalize_numbers

text = re.sub(_comma_number_re, _remove_commas, text)
File "/usr/lib/python3.8/re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)"
because a None object is being passed instead of the corresponding sentence. The error has already been reported by others. I hope this will be fixed soon...

silversurfer
Автор

restoring seems to works correctly since global step shows the correct continuing number regarding the checkpoint number, but what I'm confusing about it's that the EPOCH is getting reset from 0 but I was on 192 Epoch, shouldn't the EPOCH show the correct value too ?

lebluxtv
Автор

I've got a question. if you set the epochs to 1000, then the script ends after 1000 runs....?! how is the checkpoint save point reached (in your/default-config-10.000) ? you once said that you get an idea about the quality after ~100, 000 steps. do you set this value directly at epochs?

silversurfer
Автор

Is there a guideline for the number of samples in dataset. What is your suggestion if I have limited samples in my dataset

ankitc
Автор

hi, I restarted training from the last checkpoint, the number of global steps is updated correctly, however it restarts from epoch number 0. How can I restart it from the last epoch saved in the checkpoint?

ernestoerra
Автор

Could you explain what the best_model_number.pth is.
In my case it is a model far behind the latest, and the latests does sound better than the best_model one.
Shoud I rename the {number} to a more recent one so that in continuing it picks up losses from the later one?

Also I wanted to ask which are the advantages and disadvantages of phoneme based and character based training.

thanks a lot!

adressegrandbizarre
Автор

Guude Torsten :)
Ich versuche gerade meine Stimme zu klonen und hatte auf coqui studio in sekunden ein recht gutes Ergebnis mit einem einzigen Satz.
Ich schätze mal, es wird dort mit einem, bereits vorhandenem Checkpoint kurz trainiert und spuckt dann das neue Modell aus.
Ich weiß nicht genau, ob das so richtig ist - wie gesagt nur eine Vermutung-
die Hoffnung hierbei ist in weniger als 100k oder 400k Epochen ein brauchbares Ergebnis zu erziehlen. (mein kollab braucht 1, 5 Std für 200Epochen also würde das sonst unglaublich lange Dauern)
Habe versucht diesen Prozess in colab nachzuahmen. Ich konnte allerdings nicht wirklich ergebnisse erziehlen, weil ich nicht wusste, wie ich den neuen Datensatz einbinden kann.
als modell habe ich das default coqui TTS modell runtergeladen, in meinem Drive zusammen mit meinen Stimmsamples (.wav) und .csv-Datei abgespeichert und alle Pfade die ich gefunde habe auf das neue Set gesetzt (vielleicht etwas willkürlich)
kannst du mir vielleicht sagen, in welchem Konfig file ich den Pfad für das Datenset anpassen kann?
liebe Grüße aus dem Ruhrgebiet <3

jutube