Scale-Covariant Features for Automatic Speech Recognition

From CNBH Acoustic Scale Wiki

Jump to: navigation, search
MFCC values for vowels produced with different vocal tract lengths, in a three-dimensional space of three of the MFCC coefficient values. The values for the different vowels do not separate well.
The clustering of auditory features for ASR in a three-dimensional space of the weights of the three gaussians. The clusters for different vowels are well separated.
Vowel spectra from a man and a child, and their MFCC reconstruction. The MFCC reconstruction is good, but the individual MFCC values are not scale-shift covariant.
Gaussians fitted to the spectral distribution from an auditory filterbank in response to the vowel /i/ from different-sized speakers. The positions of the gaussians shift, but their amplitudes remain the same.
Personal tools