Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Image Process ; 33: 2795-2807, 2024.
Article in English | MEDLINE | ID: mdl-38578859

ABSTRACT

In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an L2 regression baseline with the same network architecture and state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.

2.
JASA Express Lett ; 2(10): 104802, 2022 10.
Article in English | MEDLINE | ID: mdl-36319213

ABSTRACT

Algorithmic latency in speech processing is dominated by the frame length used for Fourier analysis, which in turn limits the achievable performance of magnitude-centric approaches. As previous studies suggest the importance of phase grows with decreasing frame length, this work presents a systematic study on the contribution of phase and magnitude in modern deep neural network (DNN)-based speech enhancement at different frame lengths. Results indicate that DNNs can successfully estimate phase when using short frames, with similar or better overall performance compared to using longer frames. Thus, interestingly, modern phase-aware DNNs allow for low-latency speech enhancement at high quality.


Subject(s)
Refractive Surgical Procedures , Speech , Awareness , Neural Networks, Computer , Reading Frames
3.
IEEE Trans Med Imaging ; 40(12): 3568-3579, 2021 12.
Article in English | MEDLINE | ID: mdl-34152980

ABSTRACT

Background signals are a primary source of artifacts in magnetic particle imaging and limit the sensitivity of the method since background signals are often not precisely known and vary over time. The state-of-the art method for handling background signals uses one or several background calibration measurements with an empty scanner bore and subtracts a linear combination of these background measurements from the actual particle measurement. This approach yields satisfying results in case that the background measurements are taken in close proximity to the particle measurement and when the background signal drifts linearly. In this work, we propose a joint estimation of particle distribution and background signal based on a dictionary that is capable of representing typical background signals. Reconstruction is performed frame-by-frame with minimal assumptions on the temporal evolution of background signals. Thus, even non-linear temporal evolution of the latter can be captured. Using a singular-value decomposition, the dictionary is derived from a large number of background calibration scans that do not need to be recorded in close proximity to the particle measurement. The dictionary is sufficiently expressive and represented by its principle components. The proposed joint estimation of particle distribution and background signal is expressed as a linear Tikhonov-regularized least squares problem, which can be efficiently solved. In phantom experiments it is shown that the method strongly suppresses background artifacts and even allows to estimate and remove the direct feed-through of the excitation field.


Subject(s)
Algorithms , Artifacts , Least-Squares Analysis , Magnetic Phenomena , Magnetic Resonance Imaging , Phantoms, Imaging
4.
Front Robot AI ; 7: 85, 2020.
Article in English | MEDLINE | ID: mdl-33501252

ABSTRACT

Extracting information from noisy signals is of fundamental importance for both biological and artificial perceptual systems. To provide tractable solutions to this challenge, the fields of human perception and machine signal processing (SP) have developed powerful computational models, including Bayesian probabilistic models. However, little true integration between these fields exists in their applications of the probabilistic models for solving analogous problems, such as noise reduction, signal enhancement, and source separation. In this mini review, we briefly introduce and compare selective applications of probabilistic models in machine SP and human psychophysics. We focus on audio and audio-visual processing, using examples of speech enhancement, automatic speech recognition, audio-visual cue integration, source separation, and causal inference to illustrate the basic principles of the probabilistic approach. Our goal is to identify commonalities between probabilistic models addressing brain processes and those aiming at building intelligent machines. These commonalities could constitute the closest points for interdisciplinary convergence.

5.
Int J Audiol ; 57(sup3): S55-S61, 2018 06.
Article in English | MEDLINE | ID: mdl-28112001

ABSTRACT

OBJECTIVE: The perceived qualities of nine different single-microphone noise reduction (SMNR) algorithms were to be evaluated and compared in subjective listening tests with normal hearing and hearing impaired (HI) listeners. DESIGN: Speech samples added with traffic noise or with party noise were processed by the SMNR algorithms. Subjects rated the amount of speech distortions, intrusiveness of background noise, listening effort and overall quality, using a simplified MUSHRA (ITU-R, 2003 ) assessment method. STUDY SAMPLE: 18 normal hearing and 18 moderately HI subjects participated in the study. RESULTS: Significant differences between the rating behaviours of the two subject groups were observed: While normal hearing subjects clearly differentiated between different SMNR algorithms, HI subjects rated all processed signals very similarly. Moreover, HI subjects rated speech distortions of the unprocessed, noisier signals as being more severe than the distortions of the processed signals, in contrast to normal hearing subjects. CONCLUSIONS: It seems harder for HI listeners to distinguish between additive noise and speech distortions or/and they might have a different understanding of the term "speech distortion" than normal hearing listeners have. The findings confirm that the evaluation of SMNR schemes for hearing aids should always involve HI listeners.


Subject(s)
Correction of Hearing Impairment/instrumentation , Hearing Aids , Hearing Loss/rehabilitation , Hearing , Noise/adverse effects , Perceptual Masking , Persons With Hearing Impairments/rehabilitation , Speech Perception , Acoustic Stimulation , Adult , Aged , Auditory Threshold , Case-Control Studies , Equipment Design , Female , Hearing Loss/diagnosis , Hearing Loss/physiopathology , Hearing Loss/psychology , Hearing Tests , Humans , Male , Models, Theoretical , Persons With Hearing Impairments/psychology , Psychoacoustics , Signal Processing, Computer-Assisted , Speech Intelligibility
6.
Int J Audiol ; 57(sup3): S43-S54, 2018 06.
Article in English | MEDLINE | ID: mdl-28355947

ABSTRACT

OBJECTIVE: Single-channel noise reduction (SCNR) and dynamic range compression (DRC) are important elements in hearing aids. Only relatively few studies have addressed interaction effects and typically used real hearing aids with limited knowledge about the integrated algorithms. Here the potential benefit of different combinations and integration of SCNR and DRC was systematically assessed. DESIGN: Ten different systems combining SCNR and DRC were implemented, including five serial arrangements, a parallel and two multiplicative approaches. In an instrumental evaluation, signal-to-noise ratio (SNR) improvement and spectral contrast enhancement (SCE) were assessed. Quality ratings at 0 and +6 dB SNR, and speech reception thresholds (SRTs) in noise were measured using stationary and babble noise. STUDY SAMPLE: Thirteen young normal-hearing (NH) listeners and 12 hearing-impaired (HI) listeners participated. RESULTS: In line with an increased segmental SNR and spectral contrast compared to a serial concatenation, the parallel approach significantly reduced the perceived noise annoyance for both subject groups. The proposed multiplicative approaches could partly counteract increased speech distortions introduced by DRC and achieved the best overall quality for the HI listeners. CONCLUSIONS: For high SNRs well above the individual SRT, the specific combination of SCNR and DRC is perceptually relevant and the integrative approaches were preferred.


Subject(s)
Correction of Hearing Impairment/instrumentation , Hearing Aids , Hearing Loss, Sensorineural/rehabilitation , Hearing , Noise/prevention & control , Persons With Hearing Impairments/rehabilitation , Signal Processing, Computer-Assisted , Speech Perception , Acoustic Stimulation , Adult , Aged , Case-Control Studies , Equipment Design , Female , Hearing Loss, Sensorineural/diagnosis , Hearing Loss, Sensorineural/physiopathology , Hearing Loss, Sensorineural/psychology , Humans , Male , Middle Aged , Models, Theoretical , Noise/adverse effects , Patient Preference , Perceptual Masking , Persons With Hearing Impairments/psychology , Psychoacoustics , Speech Intelligibility , Speech Reception Threshold Test , Young Adult
7.
J Acoust Soc Am ; 140(4): EL364, 2016 Oct.
Article in English | MEDLINE | ID: mdl-27794332

ABSTRACT

For the enhancement of single-channel speech corrupted by acoustic noise, recently short-time Fourier transform domain clean speech estimators were proposed that incorporate prior information about the clean speech spectral phase. Instrumental measures predict quality improvements for the phase-aware estimators over their conventional phase-blind counterparts. In this letter, these predictions are verified by means of listening experiments. The phase-aware amplitude estimator on average achieves a stronger noise reduction and is significantly preferred over its phase-blind counterpart in a pairwise comparison even if the clean spectral phase is estimated blindly on the noisy signal.


Subject(s)
Speech , Acoustic Stimulation , Audiometry, Speech , Noise , Perceptual Masking , Signal Processing, Computer-Assisted , Speech Perception
8.
Trends Hear ; 192015 Dec 30.
Article in English | MEDLINE | ID: mdl-26721920

ABSTRACT

In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios.


Subject(s)
Algorithms , Auditory Perception/physiology , Hearing Aids , Hearing Loss, Sensorineural/therapy , Noise/prevention & control , Speech Intelligibility/physiology , Acoustics/instrumentation , Auditory Threshold/physiology , Hearing Loss, Sensorineural/diagnosis , Humans , Loudness Perception/physiology , Signal-To-Noise Ratio
9.
Trends Hear ; 192015 Dec 30.
Article in English | MEDLINE | ID: mdl-26721921

ABSTRACT

Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users.


Subject(s)
Auditory Threshold/physiology , Cochlear Implantation/instrumentation , Noise/prevention & control , Perceptual Masking/physiology , Prosthesis Design , Speech Intelligibility , Adult , Aged , Algorithms , Audiometry, Speech/methods , Cochlear Implants , Humans , Middle Aged , Prosthesis Failure , Sampling Studies , Signal-To-Noise Ratio , Speech Reception Threshold Test , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...