Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
J Acoust Soc Am ; 146(3): 1615, 2019 09.
Article in English | MEDLINE | ID: mdl-31590492

ABSTRACT

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.


Subject(s)
Speech Recognition Software/standards , Signal-To-Noise Ratio , Speech Acoustics , Voice
2.
J Acoust Soc Am ; 144(5): EL471, 2018 11.
Article in English | MEDLINE | ID: mdl-30522277

ABSTRACT

Second language learners of British English (BE) are typically trained for four intonation classes: Glide-up, Glide-down, Dive, and Take-off. Automatic four-way intonation classification could be useful to evaluate a learner's pronunciation. However, such automatic classification is challenging without having manually annotated tones, typically considered in intonation analysis and classification tasks. In this, a three-dimensional feature sequence is proposed representing temporal patterns in the utterance-level f0 contour using a perceptually motivated pitch transformation. Hidden Markov model based classification experiments conducted using a training material for teaching BE intonation demonstrate the benefit of the proposed approach over the baseline scheme considered.


Subject(s)
Pitch Perception/physiology , Speech Perception/physiology , Speech/physiology , Algorithms , England , Female , Humans , Language , Male , Phonetics , Pitch Discrimination/physiology , Speech Acoustics , Time Factors
3.
J Acoust Soc Am ; 143(4): 2289, 2018 04.
Article in English | MEDLINE | ID: mdl-29716244

ABSTRACT

The principles of the existing pitch estimation techniques are often different and complementary in nature. In this work, a frame selective dynamic programming (FSDP) method is proposed which exploits the complementary characteristics of two existing methods, namely, sub-harmonic to harmonic ratio (SHR) and sawtooth-wave inspired pitch estimator (SWIPE). Using variants of SHR and SWIPE, the proposed FSDP method classifies all the voiced frames into two classes-the first class consists of the frames where a confidence score maximization criterion is used for pitch estimation, while for the second class, a dynamic programming (DP) based approach is proposed. Experiments are performed on speech signals separately from KEELE, CSLU, and PaulBaghsaw corpora under clean and additive white Gaussian noise at 20, 10, 5, and 0 dB SNR conditions using four baseline schemes including SHR, SWIPE, and two DP based techniques. The pitch estimation performance of FSDP, when averaged over all SNRs, is found to be better than those of the baseline schemes suggesting the benefit of applying smoothness constraint using DP in selected frames in the proposed FSDP scheme. The VuV classification error from FSDP is also found to be lower than that from all four baseline schemes in almost all SNR conditions on three corpora.

SELECTION OF CITATIONS
SEARCH DETAIL
...