Utterance structures

The target cost in Festival is computed using linguistic information, so we need to provide that information for all the candidate units in the database. This information is stored in utterance structures.

The utterance structure for each of files in your database is where Festival stores all the linguistic information needed by the target cost, including the phonetic string, a tree structure that connects those phones with their parent syllables and words, and so on. We also add the phonetic timestamps obtained by forced alignment, to these structures.

First, build the utterance structures:

bash$ mkdir utt
bash$ festival $MBDIR/scm/build_unitsel.scm my_lexicon.scm
festival>(build_utts "utts.data" 'unilex-rpx)

Then run an analysis that checks the distribution of phone durations and labels any outliers:

bash$ mkdir dur
bash$ phone_lengths dur lab/*.lab
bash$ festival $MBDIR/scm/build_unitsel.scm
festival>(add_duration_info_utts "utts.data" "dur/durations")