This paper proposes a new approach for emotion recognition based on a hybrid of hidden Markov models (HMMs) and artificial neural network (ANN), using both utterance and segment level information from speech. To combine the advantage on capability to dynamic time warping of HMMs and pattern recognition of ANN, the utterance is viewed as a series of voiced segments, and feature vectors extracted from the segments are normalized into fixed coefficients using orthogonal polynomials methods, and then, distortions are calculated as an input of ANN. Meanwhile, the utterance as a whole is modeled by HMMs, and likelihood probabilities derived from the HMMs are normalized to be another input of ANN. Adopting Beihang University Database of Emotional Speech (BHUDES) and Berlin database of emotional speech, comparison between isolated HMMs and hybrid of HMMs/ANN proves that the approach introduced in this paper is more effective, and the average recognition rate of five emotion states has reached 81.7%.
Download Full PDF Version (Non-Commercial Use)