Search | VHL Regional Portal

Vision Transformer for Parkinson's Disease Classification using Multilingual Sustained Vowel Recordings.

Hemmerling, Daria; Wodzinski, Marek; Orozco-Arroyave, Juan Rafael; Sztaho, David; Daniol, Mateusz; Jemiolo, Pawel; Wojcik-Pedziwiatr, Magdalena.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083719

ABSTRACT

Parkinson's disease (PD) is the 2nd most prevalent neurodegenerative disease in the world. Thus, the early detection of PD has recently been the subject of several scientific and commercial studies. In this paper, we propose a pipeline using Vision Transformer applied to mel-spectrograms for PD classification using multilingual sustained vowel recordings. Furthermore, our proposed transformed-based model shows a great potential to use voice as a single modality biomarker for automatic PD detection without language restrictions, a wide range of vowels, with an F1-score equal to 0.78. The results of our study fall within the range of the estimated prevalence of voice and speech disorders in Parkinson's disease, which ranges from 70-90%. Our study demonstrates a high potential for adaptation in clinical decision-making, allowing for increasingly systematic and fast diagnosis of PD with the potential for use in telemedicine.Clinical relevance- There is an urgent need to develop non invasive biomarker of Parkinson's disease effective enough to detect the onset of the disease to introduce neuroprotective treatment at the earliest stage possible and to follow the results of that intervention. Voice disorders in PD are very frequent and are expected to be utilized as an early diagnostic biomarker. The voice analysis using deep neural networks open new opportunities to assess neurodegenerative diseases' symptoms, for fast diagnosis-making, to guide treatment initiation, and risk prediction. The detection accuracy for voice biomarkers according to our method reached close to the maximum achievable value.

Subject(s)

Neurodegenerative Diseases , Parkinson Disease , Voice , Humans , Parkinson Disease/complications , Parkinson Disease/diagnosis , Parkinson Disease/therapy , Speech Disorders , Biomarkers

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings.

Sztahó, Dávid; Fejes, Attila.

J Forensic Sci ; 68(3): 871-883, 2023 May.

Article in English | MEDLINE | ID: mdl-36999742

ABSTRACT

In forensic voice comparison, deep learning has become widely popular recently. It is mainly used to learn speaker representations, called embeddings or embedding vectors. Speaker embeddings are often trained using corpora mostly containing widely spoken languages. Thus, language dependency is an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different from that the model is trained on. In the case of a low-resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre-trained on multilingual (mostly English) corpus can be used on a target low-resource language (here, Hungarian), not represented by the model. Often multiple samples are not available from the offender (unknown speaker). Samples are therefore compared pairwise with and without speaker enrollment for suspect (known) speakers. Two corpora are used that were developed especially for forensic purposes and a third that is meant for traditional speaker verification. Speaker embedding vectors are extracted by the x-vector and ECAPA-TDNN techniques. Speaker verification was evaluated in the likelihood-ratio framework. A comparison is made between the language combinations (modeling, LR calibration, and evaluation). The results were evaluated by Cllrmin and EER metrics. It was found that the model pre-trained on a different language but on a corpus with a significant number of speakers can be used on samples with language mismatch. Sample duration and speaking style also seem to affect the performance.

Subject(s)

Deep Learning , Language , Forensic Medicine

The applicability of the Beck Depression Inventory and Hamilton Depression Scale in the automatic recognition of depression based on speech signal processing.

Hajduska-Dér, Bálint; Kiss, Gábor; Sztahó, Dávid; Vicsi, Klára; Simon, Lajos.

Front Psychiatry ; 13: 879896, 2022.

Article in English | MEDLINE | ID: mdl-35990073

ABSTRACT

Depression is a growing problem worldwide, impacting on an increasing number of patients, and also affecting health systems and the global economy. The most common diagnostical rating scales of depression are self-reported or clinician-administered, which differ in the symptoms that they are sampling. Speech is a promising biomarker in the diagnostical assessment of depression, due to non-invasiveness and cost and time efficiency. In our study, we try to achieve a more accurate, sensitive model for determining depression based on speech processing. Regression and classification models were also developed using a machine learning method. During the research, we had access to a large speech database that includes speech samples from depressed and healthy subjects. The database contains the Beck Depression Inventory (BDI) score of each subject and the Hamilton Rating Scale for Depression (HAMD) score of 20% of the subjects. This fact provided an opportunity to compare the usefulness of BDI and HAMD for training models of automatic recognition of depression based on speech signal processing. We found that the estimated values of the acoustic model trained on BDI scores are closer to HAMD assessment than to the BDI scores, and the partial application of HAMD scores instead of BDI scores in training improves the accuracy of automatic recognition of depression.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL