Search | VHL Regional Portal

1.

Ecology & computer audition: Applications of audio technology to monitor organisms and environment.

Schuller, Björn W; Akman, Alican; Chang, Yi; Coppock, Harry; Gebhard, Alexander; Kathan, Alexander; Rituerto-González, Esther; Triantafyllopoulos, Andreas; Pokorny, Florian B.

Heliyon ; 10(1): e23142, 2024 Jan 15.

Article in English | MEDLINE | ID: mdl-38163154

ABSTRACT

Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition - a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence - is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches.

2.

Noise Robust Recognition of Depression Status and Treatment Response from Speech via Unsupervised Feature Aggregation.

Gerczuk, Maurice; Amiriparian, Shahin; Kathan, Alexander; Bauer, Jonathan; Berking, Matthias; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083138

ABSTRACT

In the presented work, we utilise a noisy dataset of clinical interviews with depression patients conducted over the telephone for the purpose of depression classification and automated detection of treatment response. Compared to most previous studies dealing with depression recognition from speech, our data set does not include a healthy group of subjects that have never been diagnosed with depression. Furthermore, it contains measurements at different time points for individual subjects, making it suitable for machine learning-based detection of treatment response. In our experiments, we make use of an unsupervised feature quantisation and aggregation method achieving 69.2% Unweighted Average Recall (UAR) when classifying whether patients are currently in remission or experiencing a major depressive episode (MDE). The performance of our model matches cutoff-based classification via Hamilton Rating Scale for Depression (HRSD) scores. Finally, we show that using speech samples, we can detect response to treatment with a UAR of 68.1%.

Subject(s)

Depressive Disorder, Major , Humans , Depressive Disorder, Major/diagnosis , Depressive Disorder, Major/therapy , Depression/diagnosis , Depression/therapy , Speech , Recognition, Psychology , Health Status

3.

Universal Lesion Detection Utilising Cascading R-CNNs and a Novel Video Pretraining Method.

Amiriparian, Shahin; Meiners, Alexander; Rothenpieler, Daniel; Kathan, Alexander; Gerczuk, Maurice; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083221

ABSTRACT

According to the WHO, approximately one in six individuals worldwide will develop some form of cancer in their lifetime. Therefore, accurate and early detection of lesions is crucial for improving the probability of successful treatment, reducing the need for more invasive treatments, and leading to higher rates of survival. In this work, we propose a novel R-CNN approach with pretraining and data augmentation for universal lesion detection. In particular, we incorporate an asymmetric 3D context fusion (A3D) for feature extraction from 2D CT images with Hybrid Task Cascade. By doing so, we supply the network with further spatial context, refining the mask prediction over several stages and making it easier to distinguish hard foregrounds from cluttered backgrounds. Moreover, we introduce a new video pretraining method for medical imaging by using consecutive frames from the YouTube VOS video segmentation dataset which improves our model's sensitivity by 0.8 percentage points at a false positive rate of one false positive per image. Finally, we apply data augmentation techniques and analyse their impact on the overall performance of our models at various false positive rates. Using our introduced approach, it is possible to increase the A3D baseline's sensitivity by 1.04 percentage points in mFROC.

4.

Zero-shot personalization of speech foundation models for depressed mood monitoring.

Gerczuk, Maurice; Triantafyllopoulos, Andreas; Amiriparian, Shahin; Kathan, Alexander; Bauer, Jonathan; Berking, Matthias; Schuller, Björn W.

Patterns (N Y) ; 4(11): 100873, 2023 Nov 10.

Article in English | MEDLINE | ID: mdl-38035199

ABSTRACT

The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient's affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual's diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness.

5.

HEAR4Health: a blueprint for making computer audition a staple of modern healthcare.

Triantafyllopoulos, Andreas; Kathan, Alexander; Baird, Alice; Christ, Lukas; Gebhard, Alexander; Gerczuk, Maurice; Karas, Vincent; Hübner, Tobias; Jing, Xin; Liu, Shuo; Mallol-Ragolta, Adria; Milling, Manuel; Ottl, Sandra; Semertzidou, Anastasia; Rajamani, Srividya Tirunellai; Yan, Tianhao; Yang, Zijiang; Dineley, Judith; Amiriparian, Shahin; Bartl-Pokorny, Katrin D; Batliner, Anton; Pokorny, Florian B; Schuller, Björn W.

Front Digit Health ; 5: 1196079, 2023.

Article in English | MEDLINE | ID: mdl-37767523

ABSTRACT

Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.

6.

Personalised depression forecasting using mobile sensor data and ecological momentary assessment.

Kathan, Alexander; Harrer, Mathias; Küster, Ludwig; Triantafyllopoulos, Andreas; He, Xiangheng; Milling, Manuel; Gerczuk, Maurice; Yan, Tianhao; Rajamani, Srividya Tirunellai; Heber, Elena; Grossmann, Inga; Ebert, David D; Schuller, Björn W.

Front Digit Health ; 4: 964582, 2022.

Article in English | MEDLINE | ID: mdl-36465087

ABSTRACT

Introduction: Digital health interventions are an effective way to treat depression, but it is still largely unclear how patients' individual symptoms evolve dynamically during such treatments. Data-driven forecasts of depressive symptoms would allow to greatly improve the personalisation of treatments. In current forecasting approaches, models are often trained on an entire population, resulting in a general model that works overall, but does not translate well to each individual in clinically heterogeneous, real-world populations. Model fairness across patient subgroups is also frequently overlooked. Personalised models tailored to the individual patient may therefore be promising. Methods: We investigate different personalisation strategies using transfer learning, subgroup models, as well as subject-dependent standardisation on a newly-collected, longitudinal dataset of depression patients undergoing treatment with a digital intervention ( N = 65 patients recruited). Both passive mobile sensor data as well as ecological momentary assessments were available for modelling. We evaluated the models' ability to predict symptoms of depression (Patient Health Questionnaire-2; PHQ-2) at the end of each day, and to forecast symptoms of the next day. Results: In our experiments, we achieve a best mean-absolute-error (MAE) of 0.801 (25% improvement) for predicting PHQ-2 values at the end of the day with subject-dependent standardisation compared to a non-personalised baseline ( MAE = 1.062 ). For one day ahead-forecasting, we can improve the baseline of 1.539 by 12 % to a MAE of 1.349 using a transfer learning approach with shared common layers. In addition, personalisation leads to fairer models at group-level. Discussion: Our results suggest that personalisation using subject-dependent standardisation and transfer learning can improve predictions and forecasts, respectively, of depressive symptoms in participants of a digital depression intervention. We discuss technical and clinical limitations of this approach, avenues for future investigations, and how personalised machine learning architectures may be implemented to improve existing digital interventions for depression.

7.

Audio self-supervised learning: A survey.

Liu, Shuo; Mallol-Ragolta, Adria; Parada-Cabaleiro, Emilia; Qian, Kun; Jing, Xin; Kathan, Alexander; Hu, Bin; Schuller, Björn W.

Patterns (N Y) ; 3(12): 100616, 2022 Dec 09.

Article in English | MEDLINE | ID: mdl-36569546

ABSTRACT

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expensive and time-consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarizing the knowledge in audio SSL are currently missing. To fill this gap, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarize the empirical works that exploit audio modality in multi-modal SSL frameworks and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions in the development of audio SSL.

8.

Novel Insights on Induced Sparsity in Multi-Time Attention Networks.

Rajamani, Srividya Tirunellai; Rajamani, Kumar; Kathan, Alexander; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 2615-2618, 2022 07.

Article in English | MEDLINE | ID: mdl-36085772

ABSTRACT

Current deep learning approaches for dealing with sparse irregularly sampled time-series data do not exploit the extent of sparsity of the input data. Our work is inspired by the sparse and irregularly sampled nature of physiological time series data in electronic health records. We explore the effect of inducing varying degrees of sparsity on the predictive performance of Multi-Time Attention Networks (mTAN) [1]. Our methodology is to induce sparsity by first sub-sampling the time-series before feeding it to the mTAN network. We conduct empirical experiments with sub-sampling ranging from 10 to 90 %. We investigate the performance of our methodology on the Human Activity dataset and Physionet 2012 mortality prediction task. Our results demonstrate that our proposed time-point sub-sampling coupled with mTAN improves the performance by 2 % on the Human Activity dataset with 80 % lesser time-points for training. On the Physionet dataset, our approach achieves comparable performance as baseline with 30 % lesser time-points. Our experiments reveal that time-series data could be further coarsely acquired when used in tandem with state-of-the-art networks capable of handling sparse data (mTAN). This could be of immense help for various applications where data acquisition and labeling is a significant challenge.

Subject(s)

Algorithms , Neural Networks, Computer , Electronic Health Records , Humans

9.

Journaling Data for Daily PHQ-2 Depression Prediction and Forecasting.

Kathan, Alexander; Triantafyllopoulos, Andreas; He, Xiangheng; Milling, Manuel; Yan, Tianhao; Rajamani, Srividya Tirunellai; Kuster, Ludwig; Harrer, Mathias; Heber, Elena; Grossmann, Inga; Ebert, David D; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 2627-2630, 2022 07.

Article in English | MEDLINE | ID: mdl-36086268

ABSTRACT

Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression. A common target of said applications is to predict the results of self-assessed Patient-Health-Questionnaires (PHQ), indicating current symptom severity of depressive individuals. Many of the currently available approaches to predict PHQ scores use passive data, e.g., from smartphones. However, there are several other scores and data besides PHQ, e.g., the Behavioral Activation for Depression Scale-Short Form (BADSSF), the Center for Epidemiologic Studies Depression Scale (CESD), or the Personality Dynamics Diary (PDD), all of which can be effortlessly collected on a daily basis. In this work, we explore the potential of using actively-collected data to predict and forecast daily PHQ-2 scores on a newly-collected longitudinal dataset. We obtain a best MAE of 1.417 for daily prediction of PHQ-2 scores, which specifically in the used dataset have a range of 0 to 12, using leave-one-subject-out cross-validation, as well as a best MAE of 1.914 for forecasting PHQ-2 scores using data from up to the last 7 days. This illustrates the additive value that can be obtained by incorporating actively-collected data in a depression monitoring application.

Subject(s)

Depression , Patient Health Questionnaire , Depression/diagnosis , Humans , Surveys and Questionnaires

10.

Depression Diagnosis and Forecast based on Mobile Phone Sensor Data.

He, Xiangheng; Triantafyllopoulos, Andreas; Kathan, Alexander; Milling, Manuel; Yan, Tianhao; Rajamani, Srividya Tirunellai; Kuster, Ludwig; Harrer, Mathias; Heber, Elena; Grossmann, Inga; Ebert, David D; Schuller, Bjorn W.

Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 4679-4682, 2022 07.

Article in English | MEDLINE | ID: mdl-36086527

ABSTRACT

Previous studies have shown the correlation be-tween sensor data collected from mobile phones and human depression states. Compared to the traditional self-assessment questionnaires, the passive data collected from mobile phones is easier to access and less time-consuming. In particular, passive mobile phone data can be collected on a flexible time interval, thus detecting moment-by-moment psychological changes and helping achieve earlier interventions. Moreover, while previous studies mainly focused on depression diagnosis using mobile phone data, depression forecasting has not received sufficient attention. In this work, we extract four types of passive features from mobile phone data, including phone call, phone usage, user activity, and GPS features. We implement a long short-term memory (LSTM) network in a subject-independent 10-fold cross-validation setup to model both a diagnostic and a forecasting tasks. Experimental results show that the forecasting task achieves comparable results with the diagnostic task, which indicates the possibility of forecasting depression from mobile phone sensor data. Our model achieves an accuracy of 77.0 % for major depression forecasting (binary), an accuracy of 53.7 % for depression severity forecasting (5 classes), and a best RMSE score of 4.094 (PHQ-9, range from 0 to 27).

Subject(s)

Cell Phone , Depressive Disorder , Depression/diagnosis , Humans , Surveys and Questionnaires

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL