ML for Audio Study Group - Text to Speech Deep Dive

Показать описание

Vaibhav (VB) is a consultant turned student researcher at University of Stuttgart, Germany. His current research is in the field of Performance Prediction for NLP models and Speech Synthesis. He is also an active volunteer with Europython and Python DE.

Vatsal left the world of mathematics in 2017 to dive into Speech Synthesis soon after he came across the WaveNet paper. His research has focused on Normalising Flows, a particular kind of Deep Generative Model. At Amazon, he researched the deep-learning based vocoding module that is used in production, and disentanglement in deep generative models for zero-shot speech generation (text-to-speech & voice conversion): publishing 4 papers, 5 patents, and developing multiple product proof-of-concepts. Beyond speech, Vatsal has also spent some time in a team of researchers focused on Bayesian Models/Sparse Gaussian Processes

00:00 Intro
02:15 Text to Speech Intro
15:30 Tacotron 2
25:50 Code examples and finding models
31:40 Journey of Speech Synthesis
44:03 Questions