Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
J Acoust Soc Am ; 126(3): 1415-26, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19739755

ABSTRACT

Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary mask computed by thresholding the local SNR within time-frequency units and a target binary mask computed by comparing the local target energy against the long-term average speech spectrum. By depicting intelligibility scores as a function of the difference between mixture SNR and local SNR threshold, alignment of the performance curves is obtained for a large range of mixture SNR levels. Large intelligibility benefits are obtained for both sparse and dense binary masks. When an ideal mask is dense with many ones, the effect of changing mixture SNR level while fixing the mask is significant, whereas for more sparse masks the effect is small or insignificant.


Subject(s)
Noise , Perceptual Masking , Speech Perception , Speech , Acoustic Stimulation , Adult , Analysis of Variance , Automobiles , Humans , Middle Aged , Noise, Occupational , Noise, Transportation , Psychoacoustics , Sound Spectrography , Task Performance and Analysis
2.
J Acoust Soc Am ; 125(4): 2336-47, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19354408

ABSTRACT

Ideal binary time-frequency masking is a signal separation technique that retains mixture energy in time-frequency units where local signal-to-noise ratio exceeds a certain threshold and rejects mixture energy in other time-frequency units. Two experiments were designed to assess the effects of ideal binary masking on speech intelligibility of both normal-hearing (NH) and hearing-impaired (HI) listeners in different kinds of background interference. The results from Experiment 1 demonstrate that ideal binary masking leads to substantial reductions in speech-reception threshold for both NH and HI listeners, and the reduction is greater in a cafeteria background than in a speech-shaped noise. Furthermore, listeners with hearing loss benefit more than listeners with normal hearing, particularly for cafeteria noise, and ideal masking nearly equalizes the speech intelligibility performances of NH and HI listeners in noisy backgrounds. The results from Experiment 2 suggest that ideal binary masking in the low-frequency range yields larger intelligibility improvements than in the high-frequency range, especially for listeners with hearing loss. The findings from the two experiments have major implications for understanding speech perception in noise, computational auditory scene analysis, speech enhancement, and hearing aid design.


Subject(s)
Hearing Disorders/physiopathology , Noise , Perceptual Masking , Speech Intelligibility , Speech Perception , Adult , Aged , Aged, 80 and over , Analysis of Variance , Hearing Disorders/psychology , Humans , Middle Aged , Sound Spectrography , Speech
3.
J Acoust Soc Am ; 124(4): 2303-7, 2008 Oct.
Article in English | MEDLINE | ID: mdl-19062868

ABSTRACT

For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception.


Subject(s)
Noise , Perceptual Masking , Speech Intelligibility , Speech Perception , Acoustic Stimulation , Adult , Audiometry, Speech , Auditory Threshold , Humans , Middle Aged , Pattern Recognition, Physiological , Recognition, Psychology , Sound Spectrography , Time Factors
4.
IEEE Trans Neural Netw ; 19(3): 475-92, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18334366

ABSTRACT

Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals.


Subject(s)
Neural Networks, Computer , Signal Processing, Computer-Assisted , Sound , Algorithms , Humans , Principal Component Analysis
5.
Neuroimage ; 15(4): 747-71, 2002 Apr.
Article in English | MEDLINE | ID: mdl-11906218

ABSTRACT

We introduce a data-analysis framework and performance metrics for evaluating and optimizing the interaction between activation tasks, experimental designs, and the methodological choices and tools for data acquisition, preprocessing, data analysis, and extraction of statistical parametric maps (SPMs). Our NPAIRS (nonparametric prediction, activation, influence, and reproducibility resampling) framework provides an alternative to simulations and ROC curves by using real PET and fMRI data sets to examine the relationship between prediction accuracy and the signal-to-noise ratios (SNRs) associated with reproducible SPMs. Using cross-validation resampling we plot training-test set predictions of the experimental design variables (e.g., brain-state labels) versus reproducibility SNR metrics for the associated SPMs. We demonstrate the utility of this framework across the wide range of performance metrics obtained from [(15)O]water PET studies of 12 age- and sex-matched data sets performing different motor tasks (8 subjects/set). For the 12 data sets we apply NPAIRS with both univariate and multivariate data-analysis approaches to: (1) demonstrate that this framework may be used to obtain reproducible SPMs from any data-analysis approach on a common Z-score scale (rSPM[Z]); (2) demonstrate that the histogram of a rSPM[Z] image may be modeled as the sum of a data-analysis-dependent noise distribution and a task-dependent, Gaussian signal distribution that scales monotonically with our reproducibility performance metric; (3) explore the relation between prediction and reproducibility performance metrics with an emphasis on bias-variance tradeoffs for flexible, multivariate models; and (4) measure the broad range of reproducibility SNRs and the significant influence of individual subjects. A companion paper describes learning curves for four of these 12 data sets, which describe an alternative mutual-information prediction metric and NPAIRS reproducibility as a function of training-set sizes from 2 to 18 subjects. We propose the NPAIRS framework as a validation tool for testing and optimizing methodological choices and tools in functional neuroimaging.


Subject(s)
Brain Mapping/methods , Cerebral Cortex/physiology , Data Interpretation, Statistical , Magnetic Resonance Imaging/statistics & numerical data , Mathematical Computing , Psychomotor Performance/physiology , Speech Perception/physiology , Tomography, Emission-Computed/statistics & numerical data , Adult , Attention/physiology , Dominance, Cerebral/physiology , Female , Humans , Image Processing, Computer-Assisted , Imaging, Three-Dimensional , Male , Middle Aged , Models, Statistical , Phonetics , Reference Values , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...