Published in

2009 3rd International Conference on Signals, Circuits and Systems (SCS)

DOI: 10.1109/icscs.2009.5412691

Links

Tools

Export citation

Search in Google Scholar

Emotional speech characterization based on multi-features fusion for face-to-face interaction

Proceedings article published in 2009 by Ammar Mahdhaoui, Fabien Ringeval, Mohamed Chetouani ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Speech contains non verbal elements known as paralanguage, including voice quality, emotion and speaking style, as well as prosodic features such as rhythm, intonation and stress. The study of nonverbal communication has focused on face-to-face interaction since that the behaviors of communicators play a major role during social interaction and carry information between the different speakers. In this paper, we describe a computational framework for combining different features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-life application: detection of motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short- and long-term information provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A time-scale based on both vowels and consonants is proposed and it provides a relevant discriminant feature space for acted emotion recognition.