Dynamic-Time-Warping

Dynamic time warp (ing ) denotes an algorithm, the sequence of values ​​of different lengths successive maps.

Application

A prominent example is the speech recognition: Here are to be detected from a spoken text by comparing it with stored voice patterns individual words. One problem is that the words are pronounced differently often. Above all vowels are often longer or shorter spoken, as is the case in the stored voice pattern: the word "lift" for example, even with a short "e" and another time as " heeeben " are pronounced. For a successful pattern match the word should therefore be stretched or shrunk to fit, but not uniformly, but especially on the vowels that were spoken longer or shorter. (This applies to a lesser extent also true for consonants, also, certain sounds are completely swallowed. ) The Dynamic time- warping algorithm provides this adaptive time normalization.

Further areas of applicability of the dynamic time warping are eg gesture recognition in image processing or measurements correlated noise in physics.

Algorithm

The algorithm is, for example, in speech recognition applications. A spoken speech signal to be matched with a lot of existing templates (English templates) in order to recognize the spoken word and can for example run on a mobile phone a corresponding action. Voice signals are usually stored as spectral and cepstral tuples, etc are supplemented with additional information such as vocal intensity as feature vectors.

With the help of a weighting function for the individual parameters of each Wertetupels a difference measure can be placed between any two values ​​of the two signals ( cost function ), for example, a normalized Euclidean distance or Mahalanobis distance.

The algorithm now searches for the " cheapest " way from the beginning to the end of both signals over the stretched matrix of pairwise present cost of all points of the two signals. This is done efficiently using dynamic programming. The actual path, so that warping is obtained by the so-called backtracing after the first pass of the algorithm. For the mere cost determination (ie, the template selection) but extends the simple pass without backtracing. The backtracing thus allows this an accurate picture of each point of the shorter signal at one or more points of the longer signal, and thus represents the approximate time distortion

Due to algorithmic problems in the extraction of speech signal parameters ( Oktavfehler, voice activation errors, etc. ) of the optimum path through the signal difference matrix does not necessarily correspond to the actual time distortion.

250370
de