Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories

Gonzalez-Rodriguez, Joaquin; Int Speech Commun, Assoc

Published in

Interspeech 2011, 2011

DOI: 10.21437/interspeech.2011-48

Tools

Export citation

Search in Google Scholar

Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories

Proceedings article published in 2015 by Joaquin Gonzalez-Rodriguez

, Assoc Int Speech Commun

This paper was not found in any repository; the policy of its publisher is unknown or unclear.

Full text: Unavailable

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

Proceedings of Interspeech 2011, Florence (Italy) ; We describe a new approach to automatic speaker recognition based in explicit modeling of temporal contours in linguistic units (TCLU). Inspired in successful work in forensic speaker identification, we extend the approach to design a fully automatic system, with a high potential for combination with spectral systems. Using SRI's Decipher phone, word and syllabic labels, we have tested up to 468 unit-based subsystems from 6 groups of lexically-determined units, namely phones, diphones, triphones, center phone in triphones, syllables and words, subsystems being combined at the score level. Evaluating with NIST SRE04 English-only 1s1s, their hierarchical fusion gives an EER of 4.20% (minDCF=0.018) from automatic formant tracking of conversational telephone speech. Combining extremely well with a Joint Factor Analysis system (from JFA EER of 4.25% to 2.47%, minDCF from 0.020 to 0.012), extensions as more robust prosodic or spectral features are likely to further improve this approach.

Published in

Links

Tools

Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories

Abstract