PhD thesis of Mike Matton
Distance measures for template based speech and pattern recognition
Mike Matton
- Advisors: Ronald Cools and Dirk Van Compernolle
- Defense: November 2009
- KULeuven digital library copy: https://lirias.kuleuven.be/handle/123456789/246699
The main goal of this research is finding appropriate distance measures for template based speech recognition and pattern classification. In template based speech recognition, new input speech is directly compared with examples available in a database. Comparison occurs with the dynamic time warping (DTW) algorithm. This is in contrast with mainstream speech recognition, where statistical hidden Markov models are typically used to model the acoustic observations.
In particular, we perform research on introducing a class dependent scaled distance measure. We study the number of parameters and investigate methods to partition the data into classes. Next, we investigate and compare different methods to train the weights of the distance measure, based on the criterion of maximum likelihood, but mainly base don discriminative training schemes.
Experiments with these scaled distance measures on template based speech recognition and pattern classification show a consistent relative improvement when compared with a simple Euclidean distance measure, which is tradidionally used with DTW.