Training Any Language in AI Voice Cloning - Tortoise TTS

preview_player
Показать описание
Links referenced in the video:

Hardware for my PC:

Alternative prebuilds to my PC:

Cheapest and PC recommended:

Come join The Learning Journey!

If you found anything helpful, please consider supporting me and the content I am trying to produce!
Рекомендации по теме
Комментарии
Автор

Your amazing dude, I been following you almost a year and I have learned a lot from your channel, Keep it up :)

bomar
Автор

Hi Jarod! I am glad to see your new video! Thank you!
In fact, the most interesting thing I wanted to know is how you prepared the dataset for training. I asked about this under the previous video 😅. Well, I hope you will tell us about this soon)))

SAnsAN
Автор

Are you going to share your japanese models at some point?
I am working on a script that uses LLMs to generate sentences that I turn into infinite comprehensible input by scraping google images for the words and using ffmpeg to turn the audio and images into a video where for every sentence it displays an image representing the words in that sentence.

Vantaz
Автор

Just need that 4070 Super Ti, then i am going in..

kk
Автор

Oooh, this is great!! :D I want to try training a Spanish language voice! I'll watch this video asap! (I'm working now XD) Thank you very much for sharing it! :D

juanjesusligero
Автор

Hey Jarod! Been watching all your videos and I think I might have a unique challenge. I’d like to remove a tremor in someone’s voice. Since it’s possible to voice clone in other languages, this doesn’t seem impossible. I’m wondering how you would approach?

mitchelljams
Автор

Yo Jarods thanks for the guide! Could you please make another guide using tokenizer for English voices ?

ChasingStars
Автор

Oh, so if it’s a Latin alphabet language, for example, Swedish, Spanish or German, could I just use the whisper transcribed Swedish text to train the model or how will that come out?

adamrastrand
Автор

will tortose be able to work with cyrilic characters if i make a tokenizer with cyrilic characters?

lunch
Автор

I want to hear the voice training of Charlie you have there ahahhaha

ahmetalpergultekin
Автор

Hi Jarod, nice channel you got. Can you train a TTS tokenizer that can sing out lyrics of any song? Have you got a video on that? Cheers

EfeSteve-ongd
Автор

Great video!
Converting to latin is all you need, really?
Even if the language you want to train contains a lot special characters that are part of the International Phonetic Alphabet(like "ɖ, Ƒ, ɣ, ọ, ʋ") and is tonal? Leading to actual voiced labiodental approximant when "ʋ" from the Latin IPA is written?

WorldYuteChronicles
Автор

Jarod thank you for the great tutorial! Really appreciate your content is unique ✨️. I've got a question by the way concerning tokenizers: In many Turkic languages including Turkish there are letters such as "s" and "ş" both tokenize the same way into -> "s" and won't it make the model confused? Since those 2 are different letters, are written and spelled differently but tokenized into 1 letter I think there's a chance that the model will misspell them and could be confused because of the tokenizer. What do you think about it?🤔

allan
Автор

Amazing video! I have a few questions:

How much file size was approximately the 840 hours of audio you used?

Do you know where I could find a tortoise-tts model in Spanish to fine-tuning it with the voice I want to train?

Or maybe I could train my own model in Spanish and then fine-tuning it but doing it all inside the free version of google colab?

ElmorenohWTF
Автор

You think this could be better in pronunciation than XTTSv2? Interesting making a German model, I attempted on Tortoise a few months back but it wasn't great. So not sure if there been a big change since.

SyntheticVoices
Автор

I don't understand where I went wrong. I'm training Vietnamese language. I used about 1 hour of my voice for training, created tokenzier with your python file for Vietnamese language "vi". Then I tested it with a sentence that was already in the audio sample. It produced a sound that was my voice. However, the sound produced was meaningless, not Vietnamese at all. Please tell me where I went wrong??

radioketnoi
Автор

Hi Jarod, thanks so much for this demo! I learn so much from your videos. Keep up the great work! I followed you tutorial here and managed to train a spanish model using a multi speaker dataset. The training job took about 12 hours to complete successfully. After the training job, I tried generating a voice from the finetuned model. However, due to the volume of my training data, the generation process failed with OOM error. The error indicted that it ran out of memory in the compute_latent process. I have about 25 hours of voice data in my training folder. I wonder if you have any suggestions on how to overcome this issue? I am using an A10 GPU with 24GB VRAM. Thanks in advance!

iweiteh
Автор

Is that 800h dataset only one speaker? If im gonna collect that much data it would take me 100 years to transcribe it manually lol I have no way to use transcriber to do it automatically....

Oqalualaat
Автор

Hi, How to fix it us during the training process, cmd always shows me a notification "ai-voice-cloning>pause"?

AJB_run
Автор

hi, after training my model, I try to load its .pth file onto okada's AI voice changer but it says that the pth file is missing a "config" parameter or something. how do i fix that!?#@?!#@

caesq_r