go back

Combining Auditory Inspirations and Hierarchical Feature Extraction for Robust Speech Recognition

Martin Heckmann, Xavier Domont, Frank Joublin, Christian Goerick, "Combining Auditory Inspirations and Hierarchical Feature Extraction for Robust Speech Recognition", Proceedings of NAG-DAGA: International Conference on Acoustics, 2009.

Abstract

We present speech features inspired by the processing in the auditory periphery and the receptive fields found in the auditory cortex. They have a hierarchical organization and jointly evaluate variations in the spectro-temporal domain. This is why we termed them Hierarchical Spectro-Temporal (HIST) features. For their calculation we apply a Gammatone filterbank to transform the signal into the spectral domain. In a preprocessing based on local competition mechanisms we enhance the formants in the spectrogram. A set of filters learned via ICA (Independent Component Analysis) captures local variations in the spectrogram and constitutes the first layer of the hierarchy. In the second layer these local variations are combined to form larger receptive fields learned via Non Negative Sparse Coding. The dimensionality of the resulting features is reduced via the application of a Principal Component Analysis (PCA) and then fed into a Hidden Markov Model (HMM). We evaluated the performance of these features in a continuous digit recognition task in a variety of different noise conditions, similar to the Aurora task. Our results show, especially in combination with RASTA features, a significant performance improvement in noise.



Download Bibtex file

Search