Dissemin is shutting down on January 1st, 2025

Published in

Interspeech 2011, 2011

DOI: 10.21437/interspeech.2011-48

Links

Tools

Export citation

Search in Google Scholar

Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories

Proceedings article published in 2015 by Joaquin Gonzalez-Rodriguez ORCID, Assoc Int Speech Commun
This paper was not found in any repository; the policy of its publisher is unknown or unclear.
This paper was not found in any repository; the policy of its publisher is unknown or unclear.

Full text: Unavailable

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

Proceedings of Interspeech 2011, Florence (Italy) ; We describe a new approach to automatic speaker recognition based in explicit modeling of temporal contours in linguistic units (TCLU). Inspired in successful work in forensic speaker identification, we extend the approach to design a fully automatic system, with a high potential for combination with spectral systems. Using SRI's Decipher phone, word and syllabic labels, we have tested up to 468 unit-based subsystems from 6 groups of lexically-determined units, namely phones, diphones, triphones, center phone in triphones, syllables and words, subsystems being combined at the score level. Evaluating with NIST SRE04 English-only 1s1s, their hierarchical fusion gives an EER of 4.20% (minDCF=0.018) from automatic formant tracking of conversational telephone speech. Combining extremely well with a Joint Factor Analysis system (from JFA EER of 4.25% to 2.47%, minDCF from 0.020 to 0.012), extensions as more robust prosodic or spectral features are likely to further improve this approach.