Search | VHL Regional Portal

1.

Noise Robust Recognition of Depression Status and Treatment Response from Speech via Unsupervised Feature Aggregation.

Gerczuk, Maurice; Amiriparian, Shahin; Kathan, Alexander; Bauer, Jonathan; Berking, Matthias; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083138

ABSTRACT

In the presented work, we utilise a noisy dataset of clinical interviews with depression patients conducted over the telephone for the purpose of depression classification and automated detection of treatment response. Compared to most previous studies dealing with depression recognition from speech, our data set does not include a healthy group of subjects that have never been diagnosed with depression. Furthermore, it contains measurements at different time points for individual subjects, making it suitable for machine learning-based detection of treatment response. In our experiments, we make use of an unsupervised feature quantisation and aggregation method achieving 69.2% Unweighted Average Recall (UAR) when classifying whether patients are currently in remission or experiencing a major depressive episode (MDE). The performance of our model matches cutoff-based classification via Hamilton Rating Scale for Depression (HRSD) scores. Finally, we show that using speech samples, we can detect response to treatment with a UAR of 68.1%.

Subject(s)

Depressive Disorder, Major , Humans , Depressive Disorder, Major/diagnosis , Depressive Disorder, Major/therapy , Depression/diagnosis , Depression/therapy , Speech , Recognition, Psychology , Health Status

2.

Universal Lesion Detection Utilising Cascading R-CNNs and a Novel Video Pretraining Method.

Amiriparian, Shahin; Meiners, Alexander; Rothenpieler, Daniel; Kathan, Alexander; Gerczuk, Maurice; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083221

ABSTRACT

According to the WHO, approximately one in six individuals worldwide will develop some form of cancer in their lifetime. Therefore, accurate and early detection of lesions is crucial for improving the probability of successful treatment, reducing the need for more invasive treatments, and leading to higher rates of survival. In this work, we propose a novel R-CNN approach with pretraining and data augmentation for universal lesion detection. In particular, we incorporate an asymmetric 3D context fusion (A3D) for feature extraction from 2D CT images with Hybrid Task Cascade. By doing so, we supply the network with further spatial context, refining the mask prediction over several stages and making it easier to distinguish hard foregrounds from cluttered backgrounds. Moreover, we introduce a new video pretraining method for medical imaging by using consecutive frames from the YouTube VOS video segmentation dataset which improves our model's sensitivity by 0.8 percentage points at a false positive rate of one false positive per image. Finally, we apply data augmentation techniques and analyse their impact on the overall performance of our models at various false positive rates. Using our introduced approach, it is possible to increase the A3D baseline's sensitivity by 1.04 percentage points in mFROC.

3.

Zero-shot personalization of speech foundation models for depressed mood monitoring.

Gerczuk, Maurice; Triantafyllopoulos, Andreas; Amiriparian, Shahin; Kathan, Alexander; Bauer, Jonathan; Berking, Matthias; Schuller, Björn W.

Patterns (N Y) ; 4(11): 100873, 2023 Nov 10.

Article in English | MEDLINE | ID: mdl-38035199

ABSTRACT

The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient's affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual's diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness.

4.

HEAR4Health: a blueprint for making computer audition a staple of modern healthcare.

Triantafyllopoulos, Andreas; Kathan, Alexander; Baird, Alice; Christ, Lukas; Gebhard, Alexander; Gerczuk, Maurice; Karas, Vincent; Hübner, Tobias; Jing, Xin; Liu, Shuo; Mallol-Ragolta, Adria; Milling, Manuel; Ottl, Sandra; Semertzidou, Anastasia; Rajamani, Srividya Tirunellai; Yan, Tianhao; Yang, Zijiang; Dineley, Judith; Amiriparian, Shahin; Bartl-Pokorny, Katrin D; Batliner, Anton; Pokorny, Florian B; Schuller, Björn W.

Front Digit Health ; 5: 1196079, 2023.

Article in English | MEDLINE | ID: mdl-37767523

ABSTRACT

Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.

5.

A summary of the ComParE COVID-19 challenges.

Coppock, Harry; Akman, Alican; Bergler, Christian; Gerczuk, Maurice; Brown, Chloë; Chauhan, Jagmohan; Grammenos, Andreas; Hasthanasombat, Apinan; Spathis, Dimitris; Xia, Tong; Cicuta, Pietro; Han, Jing; Amiriparian, Shahin; Baird, Alice; Stappen, Lukas; Ottl, Sandra; Tzirakis, Panagiotis; Batliner, Anton; Mascolo, Cecilia; Schuller, Björn W.

Front Digit Health ; 5: 1058163, 2023.

Article in English | MEDLINE | ID: mdl-36969956

ABSTRACT

The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS).

6.

Personalised depression forecasting using mobile sensor data and ecological momentary assessment.

Kathan, Alexander; Harrer, Mathias; Küster, Ludwig; Triantafyllopoulos, Andreas; He, Xiangheng; Milling, Manuel; Gerczuk, Maurice; Yan, Tianhao; Rajamani, Srividya Tirunellai; Heber, Elena; Grossmann, Inga; Ebert, David D; Schuller, Björn W.

Front Digit Health ; 4: 964582, 2022.

Article in English | MEDLINE | ID: mdl-36465087

ABSTRACT

Introduction: Digital health interventions are an effective way to treat depression, but it is still largely unclear how patients' individual symptoms evolve dynamically during such treatments. Data-driven forecasts of depressive symptoms would allow to greatly improve the personalisation of treatments. In current forecasting approaches, models are often trained on an entire population, resulting in a general model that works overall, but does not translate well to each individual in clinically heterogeneous, real-world populations. Model fairness across patient subgroups is also frequently overlooked. Personalised models tailored to the individual patient may therefore be promising. Methods: We investigate different personalisation strategies using transfer learning, subgroup models, as well as subject-dependent standardisation on a newly-collected, longitudinal dataset of depression patients undergoing treatment with a digital intervention ( N = 65 patients recruited). Both passive mobile sensor data as well as ecological momentary assessments were available for modelling. We evaluated the models' ability to predict symptoms of depression (Patient Health Questionnaire-2; PHQ-2) at the end of each day, and to forecast symptoms of the next day. Results: In our experiments, we achieve a best mean-absolute-error (MAE) of 0.801 (25% improvement) for predicting PHQ-2 values at the end of the day with subject-dependent standardisation compared to a non-personalised baseline ( MAE = 1.062 ). For one day ahead-forecasting, we can improve the baseline of 1.539 by 12 % to a MAE of 1.349 using a transfer learning approach with shared common layers. In addition, personalisation leads to fairer models at group-level. Discussion: Our results suggest that personalisation using subject-dependent standardisation and transfer learning can improve predictions and forecasts, respectively, of depressive symptoms in participants of a digital depression intervention. We discuss technical and clinical limitations of this approach, avenues for future investigations, and how personalised machine learning architectures may be implemented to improve existing digital interventions for depression.

7.

Fatigue Prediction in Outdoor Running Conditions using Audio Data.

Triantafyllopoulos, Andreas; Ottl, Sandra; Gebhard, Alexander; Rituerto-Gonzalez, Esther; Jaumann, Mirko; Huttner, Steffen; Dieter, Valerie; Schneeweiss, Patrick; Krauss, Inga; Gerczuk, Maurice; Amiriparian, Shahin; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 2623-2626, 2022 07.

Article in English | MEDLINE | ID: mdl-36086314

ABSTRACT

Although running is a common leisure activity and a core training regiment for several athletes, between 29% and 79% of runners sustain an overuse injury each year. These injuries are linked to excessive fatigue, which alters how someone runs. In this work, we explore the feasibility of modelling the Borg received perception of exertion (RPE) scale (range: [6]-[19] [20]), a well-validated subjective measure of fatigue, using audio data captured in realistic outdoor environments via smartphones attached to the runners' arms. Using convolutional neural networks (CNNs) on log-Mel spectrograms, we obtain a mean absolute error (MAE) of 2.35 in subject-dependent experiments, demonstrating that audio can be effectively used to model fatigue, while being more easily and non-invasively acquired than by signals from other sensors.

Subject(s)

Fatigue , Muscle Fatigue , Fatigue/diagnosis , Humans , Neural Networks, Computer

8.

motilitAI: A machine learning framework for automatic prediction of human sperm motility.

Ottl, Sandra; Amiriparian, Shahin; Gerczuk, Maurice; Schuller, Björn W.

iScience ; 25(8): 104644, 2022 Aug 19.

Article in English | MEDLINE | ID: mdl-35856034

ABSTRACT

In this article, human semen samples from the Visem dataset are automatically assessed with machine learning methods for their quality with respect to sperm motility. Several regression models are trained to automatically predict the percentage (0-100) of progressive, non-progressive, and immotile spermatozoa. The videos are adopted for unsupervised tracking and two different feature extraction methods-in particular custom movement statistics and displacement features. We train multiple neural networks and support vector regression models on the extracted features. Best results are achieved using a linear Support Vector Regressor with an aggregated and quantized representation of individual displacement features of each sperm cell. Compared to the best submission of the Medico Multimedia for Medicine challenge, which used the same dataset and splits, the mean absolute error (MAE) could be reduced from 8.83 to 7.31. We provide the source code for our experiments on GitHub (Code available at: https://github.com/EIHW/motilitAI).

9.

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing From Decentralized Data.

Amiriparian, Shahin; Hübner, Tobias; Karas, Vincent; Gerczuk, Maurice; Ottl, Sandra; Schuller, Björn W.

Front Artif Intell ; 5: 856232, 2022.

Article in English | MEDLINE | ID: mdl-35372830

ABSTRACT

Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilize them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition using pre-trained image Convolutional Neural Networks (CNNs). The framework creates and augments Mel spectrogram plots on the fly from raw audio signals which are then used to finetune specific pre-trained CNNs for the target classification task. Subsequently, the whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone. DeepSpectrumLite operates decentralized, eliminating the need for data upload for further processing. We demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing by obtaining state-of-the-art results on a set of paralinguistic and general audio tasks, including speech and music emotion recognition, social signal processing, COVID-19 cough and COVID-19 speech analysis, and snore sound classification. We provide an extensive command-line interface for users and developers which is comprehensively documented and publicly available at https://github.com/DeepSpectrum/DeepSpectrumLite.

10.

Synchronization in Interpersonal Speech.

Amiriparian, Shahin; Han, Jing; Schmitt, Maximilian; Baird, Alice; Mallol-Ragolta, Adria; Milling, Manuel; Gerczuk, Maurice; Schuller, Björn.

Front Robot AI ; 6: 116, 2019.

Article in English | MEDLINE | ID: mdl-33501131

ABSTRACT

During both positive and negative dyadic exchanges, individuals will often unconsciously imitate their partner. A substantial amount of research has been made on this phenomenon, and such studies have shown that synchronization between communication partners can improve interpersonal relationships. Automatic computational approaches for recognizing synchrony are still in their infancy. In this study, we extend on previous work in which we applied a novel method utilizing hand-crafted low-level acoustic descriptors and autoencoders (AEs) to analyse synchrony in the speech domain. For this purpose, a database consisting of 394 in-the-wild speakers from six different cultures, is used. For each speaker in the dyadic exchange, two AEs are implemented. Post the training phase, the acoustic features for one of the speakers is tested using the AE trained on their dyadic partner. In this same way, we also explore the benefits that deep representations from audio may have, implementing the state-of-the-art Deep Spectrum toolkit. For all speakers at varied time-points during their interaction, the calculation of reconstruction error from the AE trained on their respective dyadic partner is made. The results obtained from this acoustic analysis are then compared with the linguistic experiments based on word counts and word embeddings generated by our word2vec approach. The results demonstrate that there is a degree of synchrony during all interactions. We also find that, this degree varies across the 6 cultures found in the investigated database. These findings are further substantiated through the use of 4,096 dimensional Deep Spectrum features.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL