Introduction

An overview of the complete process of voice building, and some tips for success.

In this practical exercise, you’re going to build a unit selection voice for a text-to-speech synthesiser from your own voice. You will create a working voice that can be loaded into Festival and used to generate intelligible speech.

This exercise focuses mainly on the waveform generator stage of the synthesis pipeline, although we may make some minor changes to the front end, by adding a few words to the pronunciation dictionary.

Before starting, be a proper engineer and

  • keep a logbook to record every single step

You’ll find this invaluable if you need to repeat any steps, and your notes will also be useful for writing up a lab report at the end.

To build your synthetic voice, you will follow step-by-step instructions and use a variety of existing tools. If you’re doing this exercise on your own, you now need to download and install Festival and HTK. This exercise was originally developed for Mac OS X but also works on Linux. Don’t attempt to do this exercise on Windows, unless you are a masochist.

Here are the main stages in this exercise:

  1. Select a recording script.
  2. Make the recordings in the studio.
  3. Prepare the workspace.
  4. Prepare the recordings in the required format, and sanity check.
  5. Label the speech.
  6. Pitchmark the speech.
  7. Build the voice.
  8. Evaluate the voice.
  9. Write up.

Each stage will be described, and I’ll also provide links to relevant material on this website, such as blog posts describing some part of the theory, or discussion forums where I answer students’ questions.

Related posts

Related forums