filmov
tv
How to Clone Any Voice With AI 🔊 | Tutorial | Tortoise-TTS

Показать описание
The speech of deepfakes are created by using a text-to-speech model to generate speech from text. Once a model is trained, it can be used to generate speech with any voice. Usually such models are separated into voice encoder, synthesizer and vocoder. A voice encoder learns to create a latent, fixed-dimensional embedding (vector) that captures various features of a particular human voice. The synthesizer learns to create a mel-spectrogram from a text transcript for a specific voice. The vocoder generates an audio waveform from the mel-spectrogram.
In this video, I introduce you to the theoretical background of text-to-speech synthesis and show you how you can create speech yourself with any voice you have access to.
My Medium Article for This Video:
00:00:00 Intro
00:01:25 Single-Speaker vs. Multi-Speaker
00:02:14 Multi-Speaker Approach
00:02:31 Speaker Encoder
00:03:55 Synthesizer
00:04:25 Mel Spectogram
00:05:31 Vocoder
00:06:26 Model Summary
00:07:29 Hands-On Voice Cloning
00:09:36 Speech Generation
00:15:03 Outro
I'm happy about any feedback I can get. :) So feel free to share it with me in the comment section, thanks. :)
In this video, I introduce you to the theoretical background of text-to-speech synthesis and show you how you can create speech yourself with any voice you have access to.
My Medium Article for This Video:
00:00:00 Intro
00:01:25 Single-Speaker vs. Multi-Speaker
00:02:14 Multi-Speaker Approach
00:02:31 Speaker Encoder
00:03:55 Synthesizer
00:04:25 Mel Spectogram
00:05:31 Vocoder
00:06:26 Model Summary
00:07:29 Hands-On Voice Cloning
00:09:36 Speech Generation
00:15:03 Outro
I'm happy about any feedback I can get. :) So feel free to share it with me in the comment section, thanks. :)
Комментарии