Published in

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

DOI: 10.1109/icassp.2009.4960496

Links

Tools

Export citation

Search in Google Scholar

Data-driven voice soruce waveform modelling

Proceedings article published in 2009 by Mark Rp P. Thomas, Jon Gudnason ORCID, Patrick A. Naylor
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in speech analysis, synthesis, recognition and coding to decompose a speech signal into its source and vocal tract filter components. Existing approaches parameterize the voice source signal with physically- or mathematically-motivated models. Though the models are well-defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here we present a data-driven approach to classify types of voice source waveforms based upon their mel frequency cepstrum coefficients with Gaussian mixture modelling. A set of ldquoprototyperdquo waveform classes is derived from a weighted average of voice source cycles from real data. An unknown speech signal is then decomposed into its prototype components and resynthesized. Results indicate that with sixteen voice source classes, low resynthesis errors can be achieved.