Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Article in English | MEDLINE | ID: mdl-37549073

ABSTRACT

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system which, in addition to affecting motor and cognitive functions, may also lead to specific changes in the speech of patients. Speech production, comprehension, repetition and naming tasks, as well as structural and content changes in narratives, might indicate a limitation of executive functions. In this study we present a speech-based machine learning technique to distinguish speakers with relapsing-remitting subtype MS and healthy controls (HC). We exploit the fact that MS might cause a motor speech disorder similar to dysarthria, which, with our hypothesis, might affect the phonetic posterior estimates supplied by a Deep Neural Network acoustic model. From our experimental results, the proposed posterior posteriorgram-based feature extraction approach is useful for detecting MS: depending on the actual speech task, we obtained Equal Error Rate values as low as 13.3%, and AUC scores up to 0.891, indicating a competitive and more consistent classification performance compared to both the x-vector and the openSMILE 'ComParE functionals' attributes. Besides this discrimination performance, the interpretable nature of the phonetic posterior features might also make our method suitable for automatic MS screening or monitoring the progression of the disease. Furthermore, by examining which specific phonetic groups are the most useful for this feature extraction process, the potential utility of the proposed phonetic features could also be utilized in the speech therapy of MS patients.


Subject(s)
Multiple Sclerosis , Speech , Humans , Phonetics , Multiple Sclerosis/diagnosis , Multiple Sclerosis/complications , Speech Acoustics , Dysarthria/diagnosis , Dysarthria/etiology
2.
Sensors (Basel) ; 23(11)2023 May 30.
Article in English | MEDLINE | ID: mdl-37299935

ABSTRACT

The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity estimation and sleepiness detection from speech, showing straightforward application possibilities for remote monitoring with acoustic sensors. The two main technical issues present in computational paralinguistics are (1) handling varying-length utterances with traditional classifiers and (2) training models on relatively small corpora. In this study, we present a method that combines automatic speech recognition and paralinguistic approaches, which is able to handle both of these technical issues. That is, we trained a HMM/DNN hybrid acoustic model on a general ASR corpus, which was then used as a source of embeddings employed as features for several paralinguistic tasks. To convert the local embeddings into utterance-level features, we experimented with five different aggregation methods, namely mean, standard deviation, skewness, kurtosis and the ratio of non-zero activations. Our results show that the proposed feature extraction technique consistently outperforms the widely used x-vector method used as the baseline, independently of the actual paralinguistic task investigated. Furthermore, the aggregation techniques could be combined effectively as well, leading to further improvements depending on the task and the layer of the neural network serving as the source of the local embeddings. Overall, based on our experimental results, the proposed method can be considered as a competitive and resource-efficient approach for a wide range of computational paralinguistic tasks.


Subject(s)
Speech Perception , Speech , Humans , Neural Networks, Computer , Speech Recognition Software , Acoustics
3.
Clin Linguist Phon ; 37(4-6): 549-566, 2023 06 03.
Article in English | MEDLINE | ID: mdl-36715451

ABSTRACT

Our research studied relapsing-remitting multiple sclerosis (RRMS). In half of the RRMS cases, mild cognitive difficulties are present, but often remain undetected despite their adverse effects on individuals' daily life. Detecting subtle cognitive alterations using speech analysis have rarely been implemented in MS research. We applied automatic speech recognition technology to devise a speech task with potential diagnostic value. Therefore, we used two narrative tasks adjusted for the neural and cognitive characteristics of RRMS; namely narrative recall and personal narrative. In addition to speech analysis, we examined the information processing speed, working memory, verbal fluency, and naming skills. Twenty-one participants with RRMS and 21 gender-, age-, and education-matched healthy controls took part in the study. All the participants with RRMS achieved a normal performance on Addenbrooke's Cognitive Examination. The following parameters of speech were measured: articulation and speech rate, the proportion, duration, frequency, and average length of silent and filled pauses. We found significant differences in the temporal parameters between groups and speech tasks. ROC analysis produced high classification accuracy for the narrative recall task (0.877 and 0.866), but low accuracy for the personal narrative task (0.617 and 0.592). The information processing speed affected the speech of the RRMS group but not that of the control group. The higher cognitive load of the narrative recall task may be the cause of significant changes in the speech of the RRMS group relative to the controls. Results suggest that narrative recall task may be effective for detecting subtle cognitive changes in RRMS.


Subject(s)
Multiple Sclerosis, Relapsing-Remitting , Multiple Sclerosis , Humans , Multiple Sclerosis, Relapsing-Remitting/diagnosis , Multiple Sclerosis, Relapsing-Remitting/psychology , Speech , Cognition , Memory, Short-Term
4.
J Int Neuropsychol Soc ; 29(1): 46-58, 2023 01.
Article in English | MEDLINE | ID: mdl-35067261

ABSTRACT

OBJECTIVE: Most recordings of verbal fluency tasks include substantial amounts of task-irrelevant content that could provide clinically valuable information for the detection of mild cognitive impairment (MCI). We developed a method for the analysis of verbal fluency, focusing not on the task-relevant words but on the silent segments, the hesitations, and the irrelevant utterances found in the voice recordings. METHODS: Phonemic ('k', 't', 'a') and semantic (animals, food items, actions) verbal fluency data were collected from healthy control (HC; n = 25; Mage = 67.32) and MCI (n = 25; Mage = 71.72) participants. After manual annotation of the voice samples, 10 temporal parameters were computed based on the silent and the task-irrelevant segments. Traditional fluency measures, based on word count (correct words, errors, repetitions) were also employed in order to compare the outcome of the two methods. RESULTS: Two silence-based parameters (the number of silent pauses and the average length of silent pauses) and the average word transition time differed significantly between the two groups in the case of all three semantic fluency tasks. Subsequent receiver operating characteristic (ROC) analysis showed that these three temporal parameters had classification abilities similar to the traditional measure of counting correct words. CONCLUSION: In our approach for verbal fluency analysis, silence-related parameters displayed classification ability similar to the most widely used traditional fluency measure. Based on these results, an automated tool using voiced-unvoiced segmentation may be developed enabling swift and cost-effective verbal fluency-based MCI screening.


Subject(s)
Cognitive Dysfunction , Verbal Behavior , Humans , Neuropsychological Tests , Cognitive Dysfunction/diagnosis , Cognitive Dysfunction/psychology , Semantics
5.
Sensors (Basel) ; 22(22)2022 Nov 08.
Article in English | MEDLINE | ID: mdl-36433196

ABSTRACT

Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a "silent speech interface", which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions.


Subject(s)
Acoustics , Tongue , Humans , Tongue/diagnostic imaging , Ultrasonography , Speech , Noise
6.
Curr Alzheimer Res ; 19(5): 373-386, 2022.
Article in English | MEDLINE | ID: mdl-35440309

ABSTRACT

BACKGROUND: The development of automatic speech recognition (ASR) technology allows the analysis of temporal (time-based) speech parameters characteristic of mild cognitive impairment (MCI). However, no information has been available on whether the analysis of spontaneous speech can be used with the same efficiency in different language environments. OBJECTIVE: The main goal of this international pilot study is to address the question of whether the Speech-Gap Test® (S-GAP Test®), previously tested in the Hungarian language, is appropriate for and applicable to the recognition of MCI in other languages such as English. METHODS: After an initial screening of 88 individuals, English-speaking (n = 33) and Hungarianspeaking (n = 33) participants were classified as having MCI or as healthy controls (HC) based on Petersen's criteria. The speech of each participant was recorded via a spontaneous speech task. Fifteen temporal parameters were determined and calculated through ASR. RESULTS: Seven temporal parameters in the English-speaking sample and 5 in the Hungarian-speaking sample showed significant differences between the MCI and the HC groups. Receiver operating characteristics (ROC) analysis clearly distinguished the English-speaking MCI cases from the HC group based on speech tempo and articulation tempo with 100% sensitivity, and on three more temporal parameters with high sensitivity (85.7%). In the Hungarian-speaking sample, the ROC analysis showed similar sensitivity rates (92.3%). CONCLUSION: The results of this study in different native-speaking populations suggest that changes in acoustic parameters detected by the S-GAP Test® might be present across different languages.


Subject(s)
Cognitive Dysfunction , Speech , Cognitive Dysfunction/diagnosis , Cognitive Dysfunction/psychology , Humans , Hungary , Language , Pilot Projects
7.
Alzheimer Dis Assoc Disord ; 36(2): 148-155, 2022.
Article in English | MEDLINE | ID: mdl-35293378

ABSTRACT

INTRODUCTION: The earliest signs of cognitive decline include deficits in temporal (time-based) speech characteristics. Type 2 diabetes mellitus (T2DM) patients are more prone to mild cognitive impairment (MCI). The aim of this study was to compare the temporal speech characteristics of elderly (above 50 y) T2DM patients with age-matched nondiabetic subjects. MATERIALS AND METHODS: A total of 160 individuals were screened, 100 of whom were eligible (T2DM: n=51; nondiabetic: n=49). Participants were classified either as having healthy cognition (HC) or showing signs of MCI. Speech recordings were collected through a phone call. Based on automatic speech recognition, 15 temporal parameters were calculated. RESULTS: The HC with T2DM group showed significantly shorter utterance length, higher duration rate of silent pause and total pause, and higher average duration of silent pause and total pause compared with the HC without T2DM group. Regarding the MCI participants, parameters were similar between the T2DM and the nondiabetic subgroups. CONCLUSIONS: Temporal speech characteristics of T2DM patients showed early signs of altered cognitive functioning, whereas neuropsychological tests did not detect deterioration. This method is useful for identifying the T2DM patients most at risk for manifest MCI, and could serve as a remote cognitive screening tool.


Subject(s)
Cognitive Dysfunction , Diabetes Mellitus, Type 2 , Aged , Cognition , Cognitive Dysfunction/diagnosis , Diabetes Mellitus, Type 2/complications , Humans , Neuropsychological Tests , Speech
8.
Clin Linguist Phon ; 35(8): 727-742, 2021 08 03.
Article in English | MEDLINE | ID: mdl-32993390

ABSTRACT

This study presents a novel approach for the early detection of mild cognitive impairment (MCI) and mild Alzheimer's disease (mAD) in the elderly. Participants were 25 elderly controls (C), 25 clinically diagnosed MCI and 25 mAD patients, included after a clinical diagnosis validated by CT or MRI and cognitive tests. Our linguistic protocol involved three connected speech tasks that stimulate different memory systems, which were recorded, then analyzed linguistically by using the PRAAT software. The temporal speech-related parameters successfully differentiate MCI from mAD and C, such as speech rate, number and length of pauses, the rate of pause and signal. Parameters pauses/duration and silent pauses/duration linearly decreased among the groups, in other words, the percentage of pauses in the total duration of speech continuously grows as dementia progresses. Thus, the proposed approach may be an effective tool for screening MCI and mAD.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Language Disorders , Aged , Alzheimer Disease/diagnosis , Cognitive Dysfunction/diagnosis , Humans , Neuropsychological Tests , Speech
9.
Curr Alzheimer Res ; 15(2): 130-138, 2018.
Article in English | MEDLINE | ID: mdl-29165085

ABSTRACT

BACKGROUND: Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. METHODS: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. RESULTS: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. CONCLUSION: The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community.


Subject(s)
Cognitive Dysfunction/diagnosis , Diagnosis, Computer-Assisted , Speech Recognition Software , Speech , Aged , Aged, 80 and over , Diagnosis, Computer-Assisted/methods , Female , Humans , Internet , Machine Learning , Male , Memory , Middle Aged , Models, Statistical , Neuropsychological Tests , Pattern Recognition, Automated/methods , ROC Curve , Speech Production Measurement
SELECTION OF CITATIONS
SEARCH DETAIL
...