Dissemin is shutting down on January 1st, 2025

Published in

Interspeech 2011, 2011

DOI: 10.21437/interspeech.2011-804

Links

Tools

Export citation

Search in Google Scholar

Drink and Speak: On the Automatic Classification of Alcohol Intoxication by Acoustic, Prosodic and Text-Based Features.

Proceedings article published in 2011 by Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

This paper focuses on the automatic detection of a person's blood level alcohol based on automatic speech processing approaches. We compare 5 different feature types with different ways of modeling. Experiments are based on the ALC corpus of IS2011 Speaker State Challenge. The classification task is restricted to the detection of a blood alcohol level above 0:5 %. Three feature sets are based on spectral observations: MFCCs, PLPs, TRAPS. These are modeled by GMMs. Classification is either done by a Gaussian classifier or by SVMs. In the later case classification is based on GMM-based supervectors, i.e. concatenation of GMM mean vectors. A prosodic system extracts a 292-dimensional feature vector based on a voicedunvoiced decision. A transcription-based system makes use of text transcriptions related to phoneme durations and textual structure. We compare the stand-alone performances of these systems and combine them on score level by logistic regression. The best stand-alone performance is the transcriptionbased system which outperforms the baseline by 4.8% on the development set. A Combination on score level gave a huge boost when the spectral-based systems were added (73.6 %). This is a relative improvement of 12.7% to the baseline. On the test-set we achieved an UA of 68.6% which is a significant improvement of 4.1% to the baseline system.