Search | VHL Regional Portal

HEAR4Health: a blueprint for making computer audition a staple of modern healthcare.

Triantafyllopoulos, Andreas; Kathan, Alexander; Baird, Alice; Christ, Lukas; Gebhard, Alexander; Gerczuk, Maurice; Karas, Vincent; Hübner, Tobias; Jing, Xin; Liu, Shuo; Mallol-Ragolta, Adria; Milling, Manuel; Ottl, Sandra; Semertzidou, Anastasia; Rajamani, Srividya Tirunellai; Yan, Tianhao; Yang, Zijiang; Dineley, Judith; Amiriparian, Shahin; Bartl-Pokorny, Katrin D; Batliner, Anton; Pokorny, Florian B; Schuller, Björn W.

Front Digit Health ; 5: 1196079, 2023.

Article in English | MEDLINE | ID: mdl-37767523

ABSTRACT

Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing From Decentralized Data.

Amiriparian, Shahin; Hübner, Tobias; Karas, Vincent; Gerczuk, Maurice; Ottl, Sandra; Schuller, Björn W.

Front Artif Intell ; 5: 856232, 2022.

Article in English | MEDLINE | ID: mdl-35372830

ABSTRACT

Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilize them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition using pre-trained image Convolutional Neural Networks (CNNs). The framework creates and augments Mel spectrogram plots on the fly from raw audio signals which are then used to finetune specific pre-trained CNNs for the target classification task. Subsequently, the whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone. DeepSpectrumLite operates decentralized, eliminating the need for data upload for further processing. We demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing by obtaining state-of-the-art results on a set of paralinguistic and general audio tasks, including speech and music emotion recognition, social signal processing, COVID-19 cough and COVID-19 speech analysis, and snore sound classification. We provide an extensive command-line interface for users and developers which is comprehensively documented and publicly available at https://github.com/DeepSpectrum/DeepSpectrumLite.

Face mask recognition from audio: The MASC database and an overview on the mask challenge.

Mohamed, Mostafa M; Nessiem, Mina A; Batliner, Anton; Bergler, Christian; Hantke, Simone; Schmitt, Maximilian; Baird, Alice; Mallol-Ragolta, Adria; Karas, Vincent; Amiriparian, Shahin; Schuller, Björn W.

Pattern Recognit ; 122: 108361, 2022 Feb.

Article in English | MEDLINE | ID: mdl-34629550

ABSTRACT

The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of 71.8 % Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of 80.1 % . Moreover, we present the results of fusing the approaches, leading to a UAR of 82.6 % . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.

Comparison of acquisition schemes for hyperpolarised ¹³C imaging.

Durst, Markus; Koellisch, Ulrich; Frank, Annette; Rancan, Giaime; Gringeri, Concetta V; Karas, Vincent; Wiesinger, Florian; Menzel, Marion I; Schwaiger, Markus; Haase, Axel; Schulte, Rolf F.

NMR Biomed ; 28(6): 715-25, 2015 Jun.

Article in English | MEDLINE | ID: mdl-25908233

ABSTRACT

The aim of this study was to characterise and compare widely used acquisition strategies for hyperpolarised (13)C imaging. Free induction decay chemical shift imaging (FIDCSI), echo-planar spectroscopic imaging (EPSI), IDEAL spiral chemical shift imaging (ISPCSI) and spiral chemical shift imaging (SPCSI) sequences were designed for two different regimes of spatial resolution. Their characteristics were studied in simulations and in tumour-bearing rats after injection of hyperpolarised [1-(13)C]pyruvate on a clinical 3-T scanner. Two or three different sequences were used on the same rat in random order for direct comparison. The experimentally obtained lactate signal-to-noise ratio (SNR) in the tumour matched the simulations. Differences between the sequences were mainly found in the encoding efficiency, gradient demand and artefact behaviour. Although ISPCSI and SPCSI offer high encoding efficiencies, these non-Cartesian trajectories are more prone than EPSI and FIDCSI to artefacts from various sources. If the encoding efficiency is sufficient for the desired application, EPSI has been proven to be a robust choice. Otherwise, faster spiral acquisition schemes are recommended. The conclusions found in this work can be applied directly to clinical applications.

Subject(s)

Algorithms , Carbon-13 Magnetic Resonance Spectroscopy/methods , Molecular Imaging/methods , Neoplasms, Experimental/metabolism , Pyruvic Acid/pharmacokinetics , Signal Processing, Computer-Assisted , Animals , Cell Line, Tumor , Humans , Information Storage and Retrieval/methods , Neoplasms, Experimental/pathology , Rats , Rats, Inbred F344 , Reproducibility of Results , Sensitivity and Specificity

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL