New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Mowlaee, Pejman; Christensen, Mads Græsbøll; Jensen, Søren Holdt

Published in

Institute of Electrical and Electronics Engineers, IEEE Transactions on Audio, Speech and Language Processing, 5(19), p. 1265-1277, 2011

DOI: 10.1109/tasl.2010.2089520

Tools

Export citation

Search in Google Scholar

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Journal article published in 2011 by Pejman Mowlaee, Mads Græsbøll Christensen

, Søren Holdt Jensen

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mix- ture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at ﬁnding sinusoidal parameters in the form of codevectors from vector quantization (VQ) codebooks pre-trained for speakers that, when combined, best ﬁt the observed mixed signal. The selected codevectors are then used to reconstruct the recovered signals for the speakers in the mixture. Compared to the log- max mixture estimator used in binary masks and the Wiener ﬁltering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross- talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational complexity of the separation by replacing the short-time Fourier transform (STFT) feature vectors of high dimensionality with sinusoidal feature vectors. We report separation results for the proposed method and compare them with respect to other benchmark methods. The improvements made by applying the proposed method over other methods are conﬁrmed by employing perceptual evaluation of speech quality (PESQ) as an objective measure and a MUSHRA listening test as a subjective evaluation for both speaker-dependent and gender-dependent scenarios.

Published in

Links

Tools

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Abstract