› Forums › Automatic speech recognition › Dynamic Time Warping (DTW) › Complexity of using Euclidean distance
This topic contains 5 replies, has 3 voices, and was last updated by Simon 1 month ago.

AuthorPosts

December 16, 2017 at 18:32 #8708
True or false: “A probability distribution is superior to using Euclidean distance because it is less computationally expensive.”?
From Machine Learning, I’m pretty sure this is true, because we must compute the Euclidean to all the observations (in question), which has a much higher computational complexity during testing. But in one of the quiz questions about ONE advantage of probability f. over Euclidean, it is one of the options and the correct answer has to be “Probability function accounts for variance (Euclidean does not)”.

December 18, 2017 at 01:18 #8733
Well, when using a probability distribution, we also need to compute the probability (or likelihood) of our observation having been generated by many possible density functions (e.g., one per class, or one per HMM state).
The overall complexity of computing a (log) probability and computing a Euclidean distance is essentially the same. We can see this in the formula for a Guassian, where the key operation is the same (squared distance from the mean).

December 18, 2017 at 10:07 #8775
Agreed. But reading the question
A probability distribution is generally superior to using a Euclidean distance measure. This is because the probability distribution
i. is less computationally expensive
…
iii. accounts for varianceI think it is fair to assume that the two are used ON THE SAME DATA SET. Since iii is true (I know that from another question), it is also fair to assume that the data set contains multiple training examples per frame/word (and for the purpose of running time analysis, I would also assume that the data set is fairly LARGE). If there was a single training example per word/frame, yes, I’d agree that they have about the same cost, but then there would be no variance and generally not much difference between the two methods. In the case of multiple training examples, using a Gaussian PDF is faster, because it only has to compute 1 value per candidate word/frame, whereas Euclidean has to compute the distance to all of the training examples in the data base (cf. k nearest neighbours method – which is known to have high complexity at test time).

December 18, 2017 at 19:39 #8798
I think the Euclidean distance measure used here computes the distance from the mean of the examples, rather than the distance from every single example? I’m getting this from my notes on video 2 in the probability foundations module, ‘From distance measures to probability distributions’.

December 18, 2017 at 20:44 #8799
Thanks Maria! Hm that would mean it is like a Gaussian without its variance? Don’t see why that would be useful – that would just be a bad model, but a model nonetheless (and I think the whole point of using Euclidean distance is to go modelfree, without making parametric assumptions). I know there is a classification method that uses reference examples instead of a parametric model, and it is based on computing the Euclidean distance to every single example, and is very inefficient. I know this is not the case in DTW, as it uses only 1 reference example.
So I think the question is just illphrased, it should say “why is using an HMM with Gaussian PDFs a better approach than DTW with Euclidean distance” – that would cover all the other things that are implied by the two approaches, such as using different amounts of training data. In that case, I would agree that they are equivalent in terms of computational cost (during testing – for HMM, training would be more computationally expensive).

December 19, 2017 at 01:55 #8831
The Euclidean distance metric is effectively the same as a Gaussian with a constant variance.

AuthorPosts
You must be logged in to reply to this topic.