Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Diagnostics (Basel) ; 14(8)2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38667463

ABSTRACT

Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4, and Bard) are assessed in their current form, as publicly available, for their ability to recognize Alzheimer's dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. A zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed information than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision, and F1 score. LLM chatbots generated a three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced the highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, the three LLM chatbots can identify AD vs. CN, surpassing chance-levels, but do not currently satisfy the requirements for clinical application.

2.
Sensors (Basel) ; 23(6)2023 Mar 10.
Article in English | MEDLINE | ID: mdl-36991710

ABSTRACT

A low-resource emotional speech synthesis system for empathetic speech synthesis based on modelling prosody features is presented here. Secondary emotions, identified to be needed for empathetic speech, are modelled and synthesised in this investigation. As secondary emotions are subtle in nature, they are difficult to model compared to primary emotions. This study is one of the few to model secondary emotions in speech as they have not been extensively studied so far. Current speech synthesis research uses large databases and deep learning techniques to develop emotion models. There are many secondary emotions, and hence, developing large databases for each of the secondary emotions is expensive. Hence, this research presents a proof of concept using handcrafted feature extraction and modelling of these features using a low-resource-intensive machine learning approach, thus creating synthetic speech with secondary emotions. Here, a quantitative-model-based transformation is used to shape the emotional speech's fundamental frequency contour. Speech rate and mean intensity are modelled via rule-based approaches. Using these models, an emotional text-to-speech synthesis system to synthesise five secondary emotions-anxious, apologetic, confident, enthusiastic and worried-is developed. A perception test to evaluate the synthesised emotional speech is also conducted. The participants could identify the correct emotion in a forced response test with a hit rate greater than 65%.


Subject(s)
Speech Perception , Speech , Humans , Speech Perception/physiology , Emotions/physiology , Anxiety
3.
Sensors (Basel) ; 23(2)2023 Jan 13.
Article in English | MEDLINE | ID: mdl-36679745

ABSTRACT

Broadband excitation introduced at the speaker's lips and the evaluation of its corresponding relative acoustic impedance spectrum allow for fast, accurate and non-invasive estimations of vocal tract resonances during speech and singing. However, due to radiation impedance interactions at the lips at low frequencies, it is challenging to make reliable measurements of resonances lower than 500 Hz due to poor signal to noise ratios, limiting investigations of the first vocal tract resonance using such a method. In this paper, various physical configurations which may optimize the acoustic coupling between transducers and the vocal tract are investigated and the practical arrangement which yields the optimal vocal tract resonance detection sensitivity at low frequencies is identified. To support the investigation, two quantitative analysis methods are proposed to facilitate comparison of the sensitivity and quality of resonances identified. Accordingly, the optimal configuration identified has better acoustic coupling and low-frequency response compared with existing arrangements and is shown to reliably detect resonances down to 350 Hz (and possibly lower), thereby allowing the first resonance of a wide range of vowel articulations to be estimated with confidence.


Subject(s)
Lip , Vibration , Lip/physiology , Acoustics , Speech Acoustics
4.
JASA Express Lett ; 2(3): 034801, 2022 02.
Article in English | MEDLINE | ID: mdl-36154632

ABSTRACT

A data-driven approach using artificial neural networks is proposed to address the classic inverse area function problem, i.e., to determine the vocal tract geometry (modelled as a tube of nonuniform cylindrical cross-sections) from the vocal tract acoustic impedance spectrum. The predicted cylindrical radii and the actual radii were found to have high correlation in the three- and four-cylinder model (Pearson coefficient (ρ) and Lin concordance coefficient (ρc) exceeded 95%); however, for the six-cylinder model, the correlation was low (ρ around 75% and ρc around 69%). Upon standardizing the impedance value, the correlation improved significantly for all cases (ρ and ρc exceeded 90%).


Subject(s)
Acoustics , Neural Networks, Computer , Electric Impedance
5.
J Acoust Soc Am ; 148(3): EL253, 2020 09.
Article in English | MEDLINE | ID: mdl-33003873

ABSTRACT

Cough is a common symptom presenting in asthmatic children. In this investigation, an audio-based classification model is presented that can differentiate between healthy and asthmatic children, based on the combination of cough and vocalised /ɑ:/ sounds. A Gaussian mixture model using mel-frequency cepstral coefficients and constant-Q cepstral coefficients was trained. When comparing the predicted labels with the clinician's diagnosis, this cough sound model reaches an overall accuracy of 95.3%. The vocalised /ɑ:/ model reaches an accuracy of 72.2%, which is still significant because the dataset contains only 333 /ɑ:/ sounds versus 2029 cough sounds.


Subject(s)
Cough , Sound , Child , Cough/diagnosis , Humans , Normal Distribution , Sound Spectrography
SELECTION OF CITATIONS
SEARCH DETAIL
...