Pesquisa | Portal Regional da BVS (teste)

Noise robust speech rate estimation using signal-to-noise ratio dependent sub-band selection and peak detection strategy.

Yarra, Chiranjeevi; Nagesh, Supriya; Deshmukh, Om D; Kumar Ghosh, Prasanta.

J Acoust Soc Am ; 146(3): 1615, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31590492

RESUMO

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.

Assuntos

Interface para o Reconhecimento da Fala/normas , Razão Sinal-Ruído , Acústica da Fala , Voz

A frame selective dynamic programming approach for noise robust pitch estimation.

Yarra, Chiranjeevi; Deshmukh, Om D; Ghosh, Prasanta Kumar.

J Acoust Soc Am ; 143(4): 2289, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29716244

RESUMO

The principles of the existing pitch estimation techniques are often different and complementary in nature. In this work, a frame selective dynamic programming (FSDP) method is proposed which exploits the complementary characteristics of two existing methods, namely, sub-harmonic to harmonic ratio (SHR) and sawtooth-wave inspired pitch estimator (SWIPE). Using variants of SHR and SWIPE, the proposed FSDP method classifies all the voiced frames into two classes-the first class consists of the frames where a confidence score maximization criterion is used for pitch estimation, while for the second class, a dynamic programming (DP) based approach is proposed. Experiments are performed on speech signals separately from KEELE, CSLU, and PaulBaghsaw corpora under clean and additive white Gaussian noise at 20, 10, 5, and 0 dB SNR conditions using four baseline schemes including SHR, SWIPE, and two DP based techniques. The pitch estimation performance of FSDP, when averaged over all SNRs, is found to be better than those of the baseline schemes suggesting the benefit of applying smoothness constraint using DP in selected frames in the proposed FSDP scheme. The VuV classification error from FSDP is also found to be lower than that from all four baseline schemes in almost all SNR conditions on three corpora.

Speech enhancement using the modified phase-opponency model.

Deshmukh, Om D; Espy-Wilson, Carol Y; Carney, Laurel H.

J Acoust Soc Am ; 121(6): 3886-98, 2007 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-17552735

RESUMO

In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.

Assuntos

Audiometria da Fala , Inteligibilidade da Fala , Percepção da Fala , Vias Auditivas/fisiologia , Sinais (Psicologia) , Bases de Dados Factuais , Humanos , Matemática , Modelos Biológicos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA