2010 IEEE International Conference on Acoustics, Speech and Signal Processing
DOI: 10.1109/icassp.2010.5495018
Full text: Download
In this paper, we report the influence that classification accuracies have in speech analysis from a clinical dataset by adding acoustic low-level descriptors (LLD) belonging to prosodic (i.e. pitch, formants, energy, jitter, shimmer) and spectral features (i.e. spectral flux, centroid, entropy and roll-off) along with their delta (Δ) and delta-delta (Δ-Δ) coefficients to two baseline features of Mel frequency cepstral coefficients and Teager energy critical-band based autocorrelation envelope. Extracted acoustic low-level descriptors (LLD) that display an increase in accuracy after being added to these baseline features were finally modeled together using Gaussian mixture models and tested. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. For male subjects, the combination of (TEO-CB-Auto-Env + Δ + Δ-Δ) + F0 + (LogE + Δ + Δ-Δ) + (Shimmer + Δ) + Spectral Flux + Spectral Roll-off gave the highest classification rate of 77.82% while for the female subjects, using TEO-CB-Auto-Env gave an accuracy of 74.74%.