Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Access ; 8: 197047-197058, 2020.
Article in English | MEDLINE | ID: mdl-33981519

ABSTRACT

In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices.

2.
IEEE Access ; 7: 169969-169978, 2019.
Article in English | MEDLINE | ID: mdl-32754421

ABSTRACT

In this paper, we present a real-time convolutional neural network (CNN) based approach for speech source localization (SSL) using Android-based smartphone and its two built-in microphones under noisy conditions. We propose a new input feature set - using real and imaginary parts of the short-time Fourier transform (STFT) for CNN-based SSL. We use simulated noisy data from popular datasets that was augmented with few hours of real recordings collected on smartphones to train our CNN model. We compare the proposed method to recent CNN-based SSL methods that are trained on our dataset and show that our CNN-based SSL method offers higher accuracy on identical test datasets. Another unique aspect of this work is that we perform real-time inferencing of our CNN model on an Android smartphone with low latency (14 milliseconds(ms) for single frame-based estimation, 180 ms for multi frame-based estimation and frame length is 20 ms for both cases) and high accuracy (i.e. 88.83% at 0dB SNR). We show that our CNN model is rather robust to smartphone hardware mismatch, hence we may not need to retrain the entire model again for use with different smartphones. The proposed application provides a 'visual' indication of the direction of a talker on the screen of Android smartphones for improving the hearing of people with hearing disorders.

3.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 429-432, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440426

ABSTRACT

Dynamic-range compression (DRC) is widely used in hearing aid devices (HADs) to reduce the wide dynamic range of input speech signal to match the residual dynamic range of people with hearing loss. Most compression systems use multi-channel compression to provide more effective and accurate solutions to match input signal with hearing-impaired people's audiogram. However, multi-channel compression introduces distortion to the system, and increases computational complexity. It limits the sampling rate and results in systems latency, hence, introduces difficulty realizing it in real-time. In this paper, a compensation filter is proposed to reduce the distortion, and polyphase implementation is applied to reduce the computational complexity. Objective and subjective tests are conducted to evaluate the quality and intelligibility of the output audio (speech) signal under different noise types and signal to noise ratios (SNRs).


Subject(s)
Hearing Aids , Signal-To-Noise Ratio , Algorithms , Hearing Loss/rehabilitation , Humans , Speech Perception
SELECTION OF CITATIONS
SEARCH DETAIL
...