Search | VHL Regional Portal

Restoring speech intelligibility for hearing aid users with deep learning.

Diehl, Peter Udo; Singer, Yosef; Zilly, Hannes; Schönfeld, Uwe; Meyer-Rachner, Paul; Berry, Mark; Sprekeler, Henning; Sprengel, Elias; Pudszuhn, Annett; Hofmann, Veit M.

Sci Rep ; 13(1): 2719, 2023 02 15.

Article in English | MEDLINE | ID: mdl-36792797

ABSTRACT

Almost half a billion people world-wide suffer from disabling hearing loss. While hearing aids can partially compensate for this, a large proportion of users struggle to understand speech in situations with background noise. Here, we present a deep learning-based algorithm that selectively suppresses noise while maintaining speech signals. The algorithm restores speech intelligibility for hearing aid users to the level of control subjects with normal hearing. It consists of a deep network that is trained on a large custom database of noisy speech signals and is further optimized by a neural architecture search, using a novel deep learning-based metric for speech intelligibility. The network achieves state-of-the-art denoising on a range of human-graded assessments, generalizes across different noise categories and-in contrast to classic beamforming approaches-operates on a single microphone. The system runs in real time on a laptop, suggesting that large-scale deployment on hearing aid chips could be achieved within a few years. Deep learning-based denoising therefore holds the potential to improve the quality of life of millions of hearing impaired people soon.

Subject(s)

Deep Learning , Hearing Aids , Hearing Loss, Sensorineural , Speech Perception , Humans , Speech Intelligibility , Quality of Life

Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes.

Diehl, Peter Udo; Thorbergsson, Leifur; Singer, Yosef; Skripniuk, Vladislav; Pudszuhn, Annett; Hofmann, Veit M; Sprengel, Elias; Meyer-Rachner, Paul.

PLoS One ; 17(11): e0278170, 2022.

Article in English | MEDLINE | ID: mdl-36441711

ABSTRACT

Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.

Subject(s)

Deep Learning , Speech , Humans , Benchmarking , Acoustics , Sound

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL