Deep Learning for Text-to-Speech Synthesis, using the Merlin toolkit

A tutorial given at Interspeech 2017
Log in

Simon King, Oliver Watts, Srikanth Ronanki, Felipe Espic
Centre for Speech Technology Research, University of Edinburgh, UK

Zhizheng Wu
Apple Inc, USA

We gratefully acknowledge the support from ISCA and from the Interspeech 2017 organisers, in putting on this tutorial in Stockholm.


This tutorial combines the theory and practical application of Deep Neural Networks (DNNs) for Text-to-Speech (TTS). It illustrates how DNNs are rapidly advancing the performance of all areas of TTS, including waveform generation and text processing, using a variety of model architectures. We link the theory to implementation with the Open Source Merlin toolkit.

Slides

You might also be interested in the Speech Synthesis course.

Simon King
Log in if you want to mark this as completed
Oliver Watts
Log in if you want to mark this as completed
Srikanth Ronanki
Log in if you want to mark this as completed
Felipe Espic
Log in if you want to mark this as completed
Zhizheng Wu
Log in if you want to mark this as completed
Felipe Espic
Log in if you want to mark this as completed
Simon King
Log in if you want to mark this as completed
Zhizheng Wu
Log in if you want to mark this as completed

Links

Video: a walk through the demo

Log in if you want to mark this as completed