Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 133(3): 1707-17, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23464040

RESUMO

Ideal binary masking is a signal processing technique that separates a desired signal from a mixture by retaining only the time-frequency units where the signal-to-noise ratio (SNR) exceeds a predetermined threshold. In reverberant conditions there are multiple possible definitions of the ideal binary mask in that one may choose to treat the target early reflections as either desired signal or noise. The ideal binary mask may therefore be parameterized by the reflection boundary, a predetermined division point between early and late reflections. Another important parameter is the local SNR threshold used in labeling the time-frequency units as either target or background. Two experiments were designed to assess the impact of these two parameters on speech intelligibility with ideal binary masking for normal-hearing listeners in reverberant conditions. Experiment 1 shows that in order to achieve intelligibility improvements only the early reflections should be preserved by the binary mask. Moreover, it shows that the effective SNR should be accounted for when deciding the local threshold optimal range. Experiment 2 shows that with long reverberation times, intelligibility improvements are only obtained when the reflection boundary is 100 ms or less. Also, the experiment suggests that binary masking can be used for dereverberation.


Assuntos
Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Adulto , Algoritmos , Análise de Variância , Humanos , Modelos Teóricos , Ruído/efeitos adversos , Mascaramento Perceptivo , Espectrografia do Som , Medida da Produção da Fala , Fatores de Tempo , Vibração , Adulto Jovem
2.
J Acoust Soc Am ; 130(4): 2153-61, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21973369

RESUMO

For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.


Assuntos
Ruído/efeitos adversos , Mascaramento Perceptivo , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Adolescente , Adulto , Limiar Auditivo , Compreensão , Feminino , Humanos , Masculino , Psicoacústica , Espectrografia do Som , Teste do Limiar de Recepção da Fala , Fatores de Tempo , Vibração , Adulto Jovem
3.
J Acoust Soc Am ; 120(1): 458-69, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16875242

RESUMO

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.


Assuntos
Meio Ambiente , Ruído/efeitos adversos , Percepção da Altura Sonora/fisiologia , Percepção da Fala/fisiologia , Algoritmos , Condicionamento Psicológico , Feminino , Humanos , Masculino , Modelos Biológicos , Espectrografia do Som , Medida da Produção da Fala , Teste do Limiar de Recepção da Fala
4.
J Acoust Soc Am ; 120(6): 4040-51, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17225430

RESUMO

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. In this paper, a binaural segregation system that extracts the reverberant target signal from multisource reverberant mixtures by utilizing only the location information of target source is proposed. The proposed system combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal T-F binary mask. The main observation in this work is that the target attenuation in a T-F unit resulting from adaptive filtering is correlated with the relative strength of target to mixture. A comprehensive evaluation shows that the proposed system results in large SNR gains. In addition, comparisons using SNR as well as automatic speech recognition measures show that this system outperforms standard two-microphone beamforming approaches and a recent binaural processor.


Assuntos
Meio Ambiente , Modelos Biológicos , Percepção da Fala , Humanos , Localização de Som
5.
J Acoust Soc Am ; 114(4 Pt 1): 2236-52, 2003 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-14587621

RESUMO

At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial localization cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, the notion of an "ideal" time-frequency binary mask is suggested, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. It is observed that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, pattern classification is performed in order to estimate ideal binary masks. A systematic evaluation in terms of signal-to-noise ratio as well as automatic speech recognition performance shows that the resulting system produces masks very close to ideal binary ones. A quantitative comparison shows that the model yields significant improvement in performance over an existing approach. Furthermore, under certain conditions the model produces large speech intelligibility improvements with normal listeners.


Assuntos
Atenção , Mascaramento Perceptivo , Localização de Som , Percepção da Fala , Adulto , Testes com Listas de Dissílabos , Feminino , Humanos , Masculino , Computação Matemática , Espectrografia do Som , Acústica da Fala , Inteligibilidade da Fala
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...