The recording script

Because unit selection relies so heavily on the contents of the database, we need to think carefully about exactly what speech we should record.

We need to select a script for recording. The standard method for this involves greedily selecting sentences, one by one, from a large text corpus (e.g., novels or newspapers) in order to maximise phonetic (and possibly prosodic) coverage. In the first part of this exercise, we will simply use the existing CMU ARCTIC script.

This would result in about one hour of speech if you record the whole thing, but I suggest you start by recording only the ‘A’ set of 593 prompts and build a voice from those. You can go back and record the ‘B’ set later if you wish, or you could spend that time recording more of your own material, or on other improvements or evaluation.

Because recording will take a little time (around 5 hours in the studio per hour of speech material obtained), you should get started on recording the ARCTIC sentences immediately, and design your additional sentences in parallel.

Tip

You can get started before your recordings are ready, by downloading one of the ARCTIC corpora (I recommend ‘slt‘). These are actually complete Festival voices, so you should simply copy only the waveforms into the ‘wav’ directory of your ‘ss’ directory (and discard everything else), then proceed with this exercise. You should work in separate copies of the ‘ss’ directory for each voice you build.

If the download is slow, then you can copy the waveforms from our local copy of ARCTIC like this:

bash$ rsync -avu /Volumes/Network/courses/ss/corpora/ARCTIC/cmu_us_slt_arctic/wav/ ~/Documents/ss/wav/
  • The utts.data file

    This file is the main index of the unit selection database. Festival uses it to discover which files it can select units from.

  • Adding your own material

    Whilst the ARTIC script gives general diphone coverage, it's not ideal for synthesising all types of sentence. You can try to improve your voice's naturalness for one particular domain, by adding more material to the database.

  • Automatic text selection

    Not all students will attempt this, but how about implementing your own greedy text-selection algorithm?