Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters











Database
Language
Publication year range
1.
IEEE Access ; 8: 197047-197058, 2020.
Article in English | MEDLINE | ID: mdl-33981519

ABSTRACT

In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices.

2.
IEEE Access ; 7: 169969-169978, 2019.
Article in English | MEDLINE | ID: mdl-32754421

ABSTRACT

In this paper, we present a real-time convolutional neural network (CNN) based approach for speech source localization (SSL) using Android-based smartphone and its two built-in microphones under noisy conditions. We propose a new input feature set - using real and imaginary parts of the short-time Fourier transform (STFT) for CNN-based SSL. We use simulated noisy data from popular datasets that was augmented with few hours of real recordings collected on smartphones to train our CNN model. We compare the proposed method to recent CNN-based SSL methods that are trained on our dataset and show that our CNN-based SSL method offers higher accuracy on identical test datasets. Another unique aspect of this work is that we perform real-time inferencing of our CNN model on an Android smartphone with low latency (14 milliseconds(ms) for single frame-based estimation, 180 ms for multi frame-based estimation and frame length is 20 ms for both cases) and high accuracy (i.e. 88.83% at 0dB SNR). We show that our CNN model is rather robust to smartphone hardware mismatch, hence we may not need to retrain the entire model again for use with different smartphones. The proposed application provides a 'visual' indication of the direction of a talker on the screen of Android smartphones for improving the hearing of people with hearing disorders.

3.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 433-436, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440427

ABSTRACT

In this paper, we present an improved version of a Speech source Iocalization method for Direction of Arrival (DOA) estimation using only two microphones. We also present a real-time Android application on a latest smartphone to help improve the spatial awareness of hearing impaired users. Unlike earlier methods, the proposed method is computationally more efficient and fully adaptive to dynamically changing background noise. We compare the performance of proposed method with similar earlier methods and demonstrate significantly lower DOA estimation errors as well as lower computation times. People who find it difficult to localize speech sources during group conversations or social activities can use the `easy-to-use' Android application. The proposed implementation does not need any additional hardware or external microphone attachments, and can run on any dual-microphone device, such as a smartphone or tablet.


Subject(s)
Hearing Aids , Mobile Applications , Smartphone , Sound Localization , Environment , Equipment Design , Hearing , Hearing Loss/rehabilitation , Humans , Noise , Self-Help Devices , Speech Perception
4.
J Signal Process Syst ; 90(10): 1415-1435, 2018 Oct.
Article in English | MEDLINE | ID: mdl-30294408

ABSTRACT

Robust speech source localization (SSL) is an important component of the speech processing pipeline for hearing aid devices (HADs). SSL via time direction of arrival (TDOA) estimation has been known to improve performance of HADs in noisy environments, thereby providing better listening experience for hearing aid users. Smartphones now possess the capability to connect to the HADs through wired or wireless channel. In this paper, we present our findings about the non-uniform non-linear microphone array (NUNLA) geometry for improving SSL for HADs using an L-shaped three-element microphone array available on modern smartphones. The proposed method is implemented on a frame-based TDOA estimation algorithm using a modified Dictionary-based singular value decomposition method (SVD) method for localizing single speech sources under very low signal to noise ratios (SNR). Unlike most methods developed for uniform microphone arrays, the proposed method has low spatial aliasing as well as low spatial ambiguity while providing a robust low-error with 360° DOA scanning capability. We present the comparison among different types of microphone arrays, as well as compare their performance using the proposed method.

5.
Article in English | MEDLINE | ID: mdl-25570676

ABSTRACT

This paper presents a cost-effective adaptive feedback Active Noise Control (FANC) method for controlling functional Magnetic Resonance Imaging (fMRI) acoustic noise by decomposing it into dominant periodic components and residual random components. Periodicity of fMRI acoustic noise is exploited by using linear prediction (LP) filtering to achieve signal decomposition. A hybrid combination of adaptive filters-Recursive Least Squares (RLS) and Normalized Least Mean Squares (NLMS) are then used to effectively control each component separately. Performance of the proposed FANC system is analyzed and Noise attenuation levels (NAL) up to 32.27 dB obtained by simulation are presented which confirm the effectiveness of the proposed FANC method.


Subject(s)
Acoustics , Algorithms , Artifacts , Feedback , Magnetic Resonance Imaging/methods , Signal Processing, Computer-Assisted , Computer Simulation , Humans , Least-Squares Analysis
SELECTION OF CITATIONS
SEARCH DETAIL