› Forums › General questions › Speech Recognition: FFT Coefficients
This topic contains 2 replies, has 2 voices, and was last updated by Simon 3 weeks ago.

AuthorPosts

November 1, 2017 at 19:54 #8193
In the speech recognition slide pack, I don’t understand slide 10 and 11.
On slide 10, what do these coefficients multiplying the waves mean?
Slide 11 says that: “We can vary the number of coefficients by varying the duration of the waveform being analysed”. This doesn’t make sense because should there be a fixed number of coefficients inside the vector for any part of the waveform? So don’t you mean you can vary the number of vectors?

November 1, 2017 at 20:42 #8195
Slide 10: these coefficients are the amount of energy at that frequency – these are the Fourier coefficients (think of them as weights that multiply each sine wave). If we plot them, then we get the spectrum of the signal.
Slide 11: the number of Fourier coefficients depends on the duration of the signal being analysed. But remember that we don’t analyse the whole signal at once: we divide it into short frames and perform the analysis on each frame in turn. The frames all have the same, fixed duration (e.g., 25ms).
The number of frames is equal to the total duration of the speech signal (e.g., an utterance) divided by the frame shift (e.g., 10ms).

November 1, 2017 at 20:44 #8196
In these slides, I am temporarily imagining that the Fourier coefficients (i.e., the magnitude spectrum) would be a good representation for Automatic Speech Recognition. Whilst we could use them, we can do better by performing some feature engineering – this is covered a little later on.

AuthorPosts
You must be logged in to reply to this topic.