Kokoro Local TTS + Custom Voices

preview_player
Показать описание
Kokoro it's a small TTS model that's really high-quality that can be run both in Colab and locally very easily.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
01:04 TTS Arena: Benchmarking TTS Models
02:06 Kokoro Model Card
03:05 Kokoro Onnx Github
03:50 Colab Demo
07:27 Blending Custom Voices
10:58 Kokoro Onnx Demo (Installing Locally)
Рекомендации по теме
Комментарии
Автор

Thanks, have realized over the last few days that the thing we need (as a start) for many models are simple guides like this to get started

KrullMaestaren
Автор

hmm Tiny TTs is definitely an interesting name

andherium
Автор

XTTS v2 still the best so far. I'm using XTTS v2 since last year and I'm surprised there isn't another TTS that can compete with it. Not only it has a lot of voices that sounds really good but also they are multilingual, so they sound good on most languages, including spanish and other ones that most of the models don't have. Oh, and its fast even using it with CPU only.

ElChapoDel
Автор

I've been waiting for this for so long. Being able to turn any PDF/text file into an audio book should have been possible so long ago.

kevin.malone
Автор

love to see video on conversation with local agents

mageshyt
Автор

Very cool idea! I made a branch of hexgrad's current repo that incorporates a weights option natively, and allows mixing an arbitrary number of voices. Pull request submitted.

In any case, thanks! I like Kokoro a lot and wanted any ability to slightly tweak the voices given the limited set available. With this I was able to dial a couple in just a little bit more to my liking, and it's super simple.

timm
Автор

Wht we need is is a model that gives precise control over the emotion, intonation, cadence, pacing, volume, timing and pitch of the voices, not more monotone models.

jmg
Автор

One potentially cool application of blending would be to blend between voice styles like laughing, crying, angry, etc, based on what's being said (maybe with a small llm) and other things.

CapsAdmin
Автор

I really liked this video! Any plans to also make a video about "training" your own embeddings for the model with your own data? Would love to see an easy tutorial for that 😉

bastothemax
Автор

This would be good for people that want to run something like Alexa locally at home. I know some people have been putting together systems for home assistant. While maybe the OpenAI integration might sound slightly better I'd consider this more than good enough to replace that and not have to send your data to OpenAI.

pin
Автор

interesting, the interpolation part shocked me, thanks

sajjaddehghani
Автор

Great overview! I was curious if anyone has used this in a local voice chatbot and if the processing time is fast enough to use realtime.

mikew
Автор

Thanks.
You have given me another reason to buy a Mac mini M4 😉

khangvutien
Автор

Should i train this model for local language for assamese

mehdiaslam
Автор

Would be great if you could just clone a voice like the other tts with reference mp3s

ScriptGurus
Автор

Is there anyway to get it to pronounce words correctly? It's not able to pronounce "live" as I live in a house any different from "that is a live wire"? I am sure this isn't the only problem, but it is common enough to make it a show stopper for articles and ebooks.

gibsononbooks
Автор

Would love to see you host the whole project locally and use it.

Zyphorix
Автор

Please help, How can we deplywnd run on Windows?

XITIJTHOOL
Автор

Is it possible to train own model for some language other than US from scratch?

helloworld
Автор

Hey there. Is your colab link still working? It's not for me. Thanks!!

ChatSites_io