Data-driven voice soruce waveform modelling

Thomas, Mark Rp P.; Gudnason, Jon; Naylor, Patrick A.

Published in

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

DOI: 10.1109/icassp.2009.4960496

Tools

Export citation

Search in Google Scholar

Data-driven voice soruce waveform modelling

Proceedings article published in 2009 by Mark Rp P. Thomas, Jon Gudnason

, Patrick A. Naylor

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in speech analysis, synthesis, recognition and coding to decompose a speech signal into its source and vocal tract filter components. Existing approaches parameterize the voice source signal with physically- or mathematically-motivated models. Though the models are well-defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here we present a data-driven approach to classify types of voice source waveforms based upon their mel frequency cepstrum coefficients with Gaussian mixture modelling. A set of ldquoprototyperdquo waveform classes is derived from a weighted average of voice source cycles from real data. An unknown speech signal is then decomposed into its prototype components and resynthesized. Results indicate that with sixteen voice source classes, low resynthesis errors can be achieved.

Published in

Links

Tools

Data-driven voice soruce waveform modelling

Abstract