go back

Applying Geometric Source Separation for Improved Pitch Extraction in Human-Robot Interaction

Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai, "Applying Geometric Source Separation for Improved Pitch Extraction in Human-Robot Interaction", Proc. INTERSPEECH, 2010.

Abstract

We present a system for robust pitch extraction in noisy and echoic environments consisting of a multi-channel signal enhancement, a pitch extraction algorithm inspired by the processing in the mammalian auditory system and a pitch tracking based on a Bayesian filter. For the multi-channel signal enhancement we deploy an 8 channel Geometric Source Separation (GSS). During pitch extraction we first apply a Gammatone filter bank and then calculate a histogram of zero crossing distances based on the band-pass signals. While calculating the histogram spurious side peaks at harmonics and sub-harmonics of the true fundamental frequency are inhibited. The grid based Bayesian tracker operating on the resulting histogram comprises a Bayesian filtering in a forward step and Bayesian smoothing in a backward step on a 100ms time window. We evaluate the system in a realistic human-robot interaction scenario with several male and female speakers. The evaluation is based on the degradation of the pitch tracking results obtained from the signals recorded on the robot to those of a simultaneously recorded clean headset signal. Hereby, we also include the comparison to two well established pitch extraction frameworks, i. e. get f0 included in theWaveSurfer Toolkit and Praat. Overall the results demonstrate that pitch tracking with small errors is possible in all cases tested and that the proposed system performs better than the two benchmark algorithms.



Download Bibtex file Download PDF

Search