Alexander Hendorf: Speech Synthesis with Tacotron2 and PyTorch | PyData Amsterdam 2019

Показать описание

Computer generated speech has existed for a while, parameters being painfully engineered by hand. Deep Learning models can be efficient at learning inherent features of data - how well does this work out for audio?

There are different DL-models as WaveNet, SampleRNN and Tacotron2. After a quick overview, I'm going to focus on Tacotron2 - how it works, it's benefits and how to implement it with PyTorch.

With Tacotron2 we make no assumption what features should be passed to the vocoder. All there is required are audio-snippets and corresponding text. Non-English language audio datasets are hard to get. I had to generate my own dataset. This talk will also cover how I have created my own dataset in a semi-automatic efficiently with tools like audiotok and methods as Speaker diarisation.

The talks will feature synthesised speech audio demos. I will also cover some failures and reason about them.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.