Best Open Source Text-to-Speech AI Tutorial in 2024

preview_player
Показать описание
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

Contrarily to other TTS models, Parler-TTS is a fully open-source release. All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.

🔗 Links 🔗

❤️ If you want to support the channel ❤️
Support here:

🧭 Follow me on 🧭
Рекомендации по теме
Комментарии
Автор

indians are the best at coding. fact. you sounded so good i subscribed

antonpictures
Автор

Don't be so harsh on yourself. Your voice is much better than the AI voice you demo'd in the beginning. MUCH better.

Kleidos
Автор

If you would be wearing earbuds or headphones you would realize that the generated audio through AI was majorly running only on the left channel of pair !!

siddhubhai
Автор

You look smart after getting your hair cut. It's been a week since I last saw you.

__________________________
Автор

Thank you for introducing this model, gonna use this for my product. Just a suggestion, there are very less tuttorials on youtube where they take a model and show how to implement the models in project, these tuttorials will give your channel a lot of power and also very helpful for begginers, would love to see more of such kind...

sagarangadi
Автор

This video is right in time. I am working on a local chatbot with speech output.

pareak
Автор

Fine tuning my own voice on this model will be interesting

puneet
Автор

Make a tutorial on how to produce its llama.cpp version so what we can use it for android app inferencing

ogahsunday
Автор

Very useful to know about this option. I just failed miserably when trying to figure out why the voice with bark are different all the time until I realized that this is by design. I'm not happy with CoquiTTS either, specially when it comes to non-English speakers and Tortoise has it's issue already in its name. There is some hype about AllTalk TTs but that's in it's core just CoquiTTS. Did I miss a major option?

testales
Автор

People are using the term artificial intelligence so vagely nowadays can you make a video that explains what actually ai is and what is the difference between having a basic algorithm like Google or youtube and having ai

dhruvmehta
Автор

I loved it, I just subscribed,
Could u please drop a tutorial to fine tune this with regional language like Telugu, Thai or viatnames please …..

KALYAN
Автор

Are there any oss api server for this model, sir?

BiMoba
Автор

i was genuinely fooled the first few secs, i was just thinking maybe you know how to impress the global audience with your new accent.

vivekkarumudi
Автор

lol..for a second I thought there is some issue with my laptop 🙂

KumR
Автор

hey bro, are there any opensource models to enhance audio like in adobe firefly?

intfloat
Автор

this is a cool tool, could do a video on how to train for foreign language like french ?

MaraScottAI
Автор

Is this model trained for multilingual generation

gmag
Автор

I feel alibaba's fun-audio-llm's cosyvoice and sensevoice are much better than this.. Opensource and really good models

harshsethia
Автор

It's a shame that voice cloning is not enabled by default. I am guessing it's a legal issue. I image it's easy to do though. Just like they convert the voice description into vector space to adjust the output, you could do the same with an audio input.

SloanMosley
Автор

"ED IN BRUH" (not eye-din-burg university)🙂

iroehkv