Mel-frequency cepstrum

The Mel Frequency Cepstral Coefficients ( MFCC ) (German mel-frequency cepstral coefficients ) are used for automatic speech recognition. They lead to a compact representation of the frequency spectrum. The Mel in the name describes the perceived pitch.

MFCCs are also used for analysis of music. In particular, they are used for the detection of musical pieces in order to assign them metadata.

The linear modeling of speech production serves as the actual basis for the generation of MFCCs: A periodic excitation signal ( vocal cords ) is formed by a " linear filter " (mouth, tongue, nasal cavities, ...). For speech is mainly the filter (or the impulse response ) of importance as " what was said " and not " in which pitch " for the analysis of interest. The computation of the MFCC is a nice way to separate the excitation signal and the impulse response of the filter.

The impulse response of the filter is expressed mathematically convolved with the excitation signal to produce the voice signal. For calculation of the cepstrum, the convolution operation is transformed due to the logarithm in an addition that is easy to separate the speech signal in which one excitation ( excitation) and source (source) can separate.

MFCCs are computed by the following steps:

Discrete Fourier transform Discrete cosine transform Principal component analysis

563393