Speaker-dependent system

The main task here is gathering the data. After that, just run the provided scripts.

Collect and label the data (optional)
Supervised machine learning needs labelled data. The task of collecting and labelling this data is often overlooked in textbooks. Performing this step yourself is OPTIONAL, but you still need to understand the process.
Parameterise the data (optional)
Our HMMs do not work directly with waveforms, but rather with features extracted from those waveforms. Performing this step yourself is OPTIONAL, but you still need to understand the process.
Train the acoustic models
We will used supervised machine learning (including the Baum-Welch algorithm) to train models on labelled data.
Language model
Even for isolated digits, we need a language model to compute the term P(W).
Evaluation
By comparing the recogniser's output with the hand-labelled test data, we can compute the Word Error Rate (WER).