Improvements and variations

It would take too long to tune every aspect of the system, but we can still identify some problems and see how to fix them. It's also easy to vary the contents of the database to discover the effect on the synthetic speech.

So far, we have just built a single voice. It would be good to have something to compare that against.

The methodology to use in this part is to create multiple versions of your voice, and then to compare them informally or in a listening test.

The way to have multiple voices is to make a complete copy of your ss folder for each variant (you should move the original recordings elsewhere first (to save space). You should also use symbolic links so that all the variants share the same wav, mfcc, and lpc folders. Not only does this save even more space, it is also good engineering practice to avoid unnecessary copies of data.

The first thing to try is to revisit each of the stages in building the voice, and see whether there is anything you can improve. For example, you could adjust the pitchmarking parameters to more closely fit your voice. Then, try some or all of the following variations:

  • Find and fix a labelling error

    To see, in principle, how we could improve the labels for the whole voice, we will just identify and then fix a single label alignment error.

  • Vary the contents of the database

    Make some simple variations on your voice, by excluding parts of the database.

  • Introduce deliberate errors

    By deliberately varying some aspects of the system, you can discover how much effect they have on the overall quality of the voice.

  • Target cost weight

    Adjust the relative weight between the target and join cost.

  • Join sub-cost weighting

    Vary the relative weightings of the join sub-cost component (F0, power, spectrum).

  • Pruning

    Festival's Multisyn unit selection engine prunes the candidate lists, and performs more pruning during the search.