What you should already know

Before continuing, you should check that you have the right background by watching this video.

Interactive toy demo

A short video demonstration of unit selection. You can find the actual interactive demo on this website. Have a play with it yourself!

Search

With multiple candidates available for each target position, a search must be performed.

Target cost and join cost

To choose between the many possible sequences of candidate units, we need to quantify how good each possible sequence will sound.

Target and candidate units

We use the linguistic specification from the front end to define a target unit sequence. Then, we find all potential candidate units in the database.

Key concepts

Linguistic context affects the acoustic realisation of speech sounds. But several different linguistic contexts can lead to almost the same sound. Unit selection takes advantage of this “interchangeability”.

The speed of sound

At the Parque de las Ciencias in Granada, Spain there is this long tube, open at the end nearest you and closed at the far end. We can calculate the length of this tube just from the audio recording, because we know the speed of sound. Here’s the waveform of part of the recording, showing […]

Wave propagation on the surface of water

At the Alhambra (Granada, Spain) I saw this nice example of waves from a point source propagating in all directions at a fixed speed.

Autocorrelation for estimating F0

Most methods for estimating F0 start from autocorrelation. The idea is pretty simple: we are just looking for a repeating pattern in the waveform, which corresponds to the periodic vocal fold activity. For some waveforms, it might be possible to do that directly in the time domain, but in general that doesn’t work very well. […]

The Gaussian probability density function: understanding the equation

The equation for the Gaussian probability density function looks a little scary at first, but this video should help you understand what each of the terms is doing, and how they fit together. After watching the video download the spreadsheet which shows the calculations and plots from this video (tip: the Apple Numbers.app version includes images […]

Token passing

Token passing is a really nice way to understand (and even to implement) Viterbi search for Hidden Markov Models. Here we see token passing in action, and you can look at the spreadsheet to see the calculations. To keep things simple, we are ignoring transition probabilities in this example. It would be simple to add them […]

TD-PSOLA …the hard way

Time-Domain Pitch Synchronous Overlap and Add (TD-PSOLA) can modify the fundamental frequency and duration of speech signals, without affecting the segment identity – that is, without changing the formants. Normally, it’s an automatic algorithm, but here we do it the hard way – by hand! If you want to follow-along, you will need Audacity and these materials (a […]