Step-by-step

It's possible to run each step in the text-to-speech pipeline manually, and inspect what Festival does at each point.

We are going to examine the speech synthesis process in detail. You will examine the sequence of processes Festival uses to perform text-to-speech (TTS) and relate these processes to what you have learnt in the Speech Processing course. Make notes in a lab book as you work! Remember that Festival is just one example of a TTS system – there are generally many other ways of implementing each step.

Festival stores information like words, phonemes, F0 targets, syntax, etc. in things called Relations. These approximately correspond to the levels of linguistic representation mentioned in lectures. As each stage in the process is performed, more information is added to the utterance structure, and you can examine this inside Festival. So, each step in the pipeline will either add a new relation, or modify an existing one. Note, however, not all TTS voice configurations generate and use all types of relations (e.g., the awb voice is definitly missing some that the kal voice has).

Your task in this part of the exercise is to explore the synthesis process and discover what Festival does in each step of processing, when converting text to speech. If you notice errors in the prediction of phrases, pronunciation, the processing of numbers or anything else, make a note of it, as this will be useful for the next part of the exercise.

Hints

Three hints for this practical exercise:

  1. Read the instructions through completely before you start.
  2. Use Festival’s tab-completion and command-line history (which is kept even if you quit and restart Festival) to save typing and avoid mistakes.
  3. If things go wrong (either with Festival, or with you), quit Festival and restart it.

Festival help

Festival can make your life easier in a number of ways.

Command history

You can access commands you have previously typed using the arrow keys. Press the up arrow a number of times to see the previous commands you entered, then use the left and right arrow keys to edit them. Press ENTER to run the edited command.

TAB completion

If you start to type a command name into Festival, then press TAB, it will either complete the command for you or give you a list of possible completions. For example, to get a list of all of the commands that work on an utterance type

festival> (utt.

and then press TAB once or twice.

Getting help

Most commands in Festival have built-in help. Type the name of the command (including the initial open bracket) and the press ⟨ALT⟩-h (Hold the ALT key down and press h), or alternatively, press (and release) ESC and then press h. Festival will print some help for that command, including a list of arguments that it expects.

Make an assignment working directory

Before you start, make a folder to work in, so you can keep your data organised.  On the PPLS AT lab computers/remote desktop, you can make a directory in ~/Documents/sp/assignment1. We will assume this is your working directory in the rest of these instructions.  You can do this on the command line (i.e., in a terminal) with the following commands.

cd ~
mkdir -p ~/Documents/sp/assignment1

Change into this directory, e.g.:

cd ~/Documents/sp/assignment1

You can use the pwd command to see which directory you are currently in and the ls command to see which files are in the current directory.

Starting Festival with a specific voice configuration

Previously, you started Festival in the default mode.  But for this assignment we want to use a specific voice in a specific configuration.

We are going to use a unit selection voice, called cstr_edi_awb_arctic_multisyn in this exercise.

We’re going to use a simple configuration file to tell Festival to load the correct voice, and to add a few extra commands (file location of the AT lab servers):

cp /Volumes/Network/courses/sp/assignment1/config.scm .
chmod ugo+r config.scm

If you followed this gist to install festival on your own computer, you probably already downloaded the config.scm file (by default the “assignment1” directory next to the “tools” directory you installed festival (see lines 82-85). In that case, you can start festival from there or move the config.scm file to the directory you want to work in.

Remember that if you come back later, you only need to cd to your working direcotry (e.g. ~/Documents/sp/assignment1 following the remote desktop instructions). You don’t need to copy the file again as long as you start festival from a directory that contains config.scm. Now, every time you start Festival during the rest of this exercise, do it like this.

festival config.scm

Compared to the earlier exercises using a diphone voice, Festival will take longer to start when loading this unit selection voice. Why?

In the following festival> at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.

Once Festival is running, check that it speaks:

festival> (SayText "hello world")

You should hear a reasonably good quality Scottish male voice. If not, you probably forgot to start festival using the config.scm file.

Synthesising an utterance step-by-step

Read this section carefully before trying any of the commands.

So far we have only synthesised utterances from start to finish in one go, using SayText. Now we are going to do it step-by-step.
First you need to create a new utterance object. The following command creates a new utterance object with the text supplied and sets the variable myutt to point to it. In the following festival> at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.

festival> (set! myutt (Utterance Text "Put your own text here."))

Now you can manually run each step of the text-to-speech pipeline – don’t skip any steps (what would happen if you did?). Use a single short utterance of your own when performing this part – make it interesting (e.g., containing some text that needs normalising, a word that is unlikely to be in the dictionary, and so on).

festival> (Initialize myutt)
festival> (Text myutt)
festival> (Token_POS myutt)
festival> (Token myutt)
festival> (POS myutt)
festival> (Phrasify myutt)
festival> (Word myutt)
festival> (Pauses myutt)
festival> (PostLex myutt)
festival> (Wave_Synth myutt)
festival> (utt.play myutt)

If you get an error, you will have to start again by creating a new utterance with the set! command. If you get confused, quit Festival and start from the beginning again.

Note that running the synthesis pipeline step-by-step is just to help you understand what is happening. You might need it to diagnose some mistakes later on, but most of the time, you will just use SayText.

Commands for examining utterances

You should pause to examine the contents of myutt between each step.

To determine which relations are now present:

festival> (utt.relationnames myutt)

and to examine a particular Relation (if it exists):

festival> (utt.relation.print myutt 'Phrase)
festival> (utt.relation.print myutt 'Word)
festival> (utt.relation.print myutt 'Segment)

and so on for any other Relations that exist.

You can also use the following command to see the overall structure of the utterance:

festival> (print_sylstructure myutt)

This will show you how the different relations tie together. It will show you Words, Syllables as lists of segments, and the presence of stress.

Concentrate on discovering which commands create or modify Relations in the utterance structure, and what information is stored in those Relations. Note: the initialize command will not reveal anything interesting, and it may be difficult to see what the Pauses and PostLex commands do.

What you should now be able to do

  • start Festival and load a configuration file (which is just a sequence of Scheme commands to run after startup)
  • Make full use of keyboard shortcuts including: TAB completion, ctrl-A, ctrl-E, ctrl-P, ctrl-N, ctrl-R, up/down cursor keys to navigate the command history, left/right cursor keys to edit a command.
  • run the pipeline step-by-step
  • describe which Relations are added, or modified, by each step
  • understand that Relations are composed of Items
  • understand that Items are data structures containing an unordered set of key-value pairs
  • have an initial understanding of what some (but not all) of the keys and values mean (e.g., POS tags in the Word relation)