My Top 5 Open-Source AI Text-to-Speech Models

preview_player
Показать описание
Links referenced in the video:

Laptop that I use:

Hardware for my PC:

Alternative prebuilds to my PC:

Cheapest and PC recommended:

Come join The Learning Journey!

If you found anything helpful, please consider supporting me and the content I am trying to produce!
Рекомендации по теме
Комментарии
Автор

I was just wondering what are really the best TTS models. Thanks for this awesome video, mate 👍

Jorvanius
Автор

Zonos is SOTA for my use cases. Very organic and real feel

AIWarper
Автор

I hope it doesnt take too long for these to get better and faster. They are gna be soooo useful for games and stuff.

animeswitch
Автор

You know what! Every YouTuber about TTS seems to focus only on creating voices for audiobooks, as if that's the only purpose for TTS. Why doesn’t anyone make a video showcasing TTS that sounds more human...like natural, emotional conversation for everyday use? Like to show it could mimic how we talk to friends or in public(totally informal), with emotion and natural flow, instead of just being optimized for audiobooks. I’ve never come across a video where someone creates multi-voice TTS designed specifically for intense, seamless group debates without any pauses.

codelucky
Автор

XTTSV2 is still one of the best IMO— at least if you're not looking to do narration stuff. Fast, and emotional if you feed it the right clip

MeinCouch
Автор

So glad I found your channel, Been hooked on the great content. Thanks

omarplay
Автор

Minimax's latest speech-02 is extremely useful.

snowpeace
Автор

nice. really interested in getting one running to narrate all my books for me, can't wait

chancepaladin
Автор

Nice, thansk for your Summary 👍
You also checked the new Zonos TTS, seems pretty good Zero Shot

schakuun
Автор

You're going to put the Kokoro V1 in the audiobook maker, it will be amazing if you manage to make it available for us to use, especially with more language options.

Nerdolord
Автор

I would love to hear which models are your favorite for training other languages, especially low-resourced languages. Thanks for your content.

AndrasEliassen
Автор

Thanks a lot 👍 I was looking for a model to produce voice for news podcast 😊

experiencesenvolees
Автор

Thanks for the comparisons. LlaSA-3B just dropped their finetune instructions as well.

JeradBenge
Автор

1:52
4:16 whispered
4:00 hah!

1:44 gpt sovitz
8:35 sovitz again but a little worse, no “whusper” effect like xtts

——

Apparently Kokoro even better but
No voice learning

—-

15:33 some of these models do really well after fine tune do much better after this
17:30 RVC push’s it even more

18:36 most ppl are fine tuning on xtts nowadays

————-

18:55 he believes gpt sovitz is best
21s for 4s to generate

————

19:24 speed of generation for all models

Style is fastest
Tortoise is 2x real time so 21s = 10s
Vs
Gpt was 21s-> 4s

Xtts is similar to tortoise but w deep speed it can go 5 or was it 10x
——
21:00

——

21:30 kokoro

improvementTime..
Автор

xtts v2 is still the best quality. As far as I can tell no one has beat it in open sourced models. (well open weights and finetuning)

devmentorlive
Автор

Even now, XTTS v2 shows great potential. However, I think what's important, is how dynamic the voice output is. Specifically, with the voice intonation. It wouldn't surprise me if you added someone singing then make them speak, through their singing sample.
I tried installing the newest TTS, but it requires docker, and my PC can't handle docker, lol. There's gotta be a better way to install that app. I see you also use RVC. In theory, some would say, RVC is the Lora of voice cloning. Overall, GPT SoVITS wins.

megamayo
Автор

Great vid!! F5-TTS has the ability to have multiple voices in the "multi chat". Is there any others that can do this?

MurderbyMaestro
Автор

Thank for this video, sadly I did not managed to make "AudioBook Maker" run on Linux, for now.

dfowgzv
Автор

Holy crap, Zonos HAS to release on Windows that is excellent.

YouVoxAI
Автор

StyleTTS sounds the least synthetic. Kokoro sounds amazing with the right voice out-of-the-box but can sound way too robotic with some. Thanks for video

retarrrdy
visit shbcf.ru