Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Adv ; 9(23): eabq2969, 2023 06 09.
Article in English | MEDLINE | ID: mdl-37294764

ABSTRACT

The genetic basis of the human vocal system is largely unknown, as are the sequence variants that give rise to individual differences in voice and speech. Here, we couple data on diversity in the sequence of the genome with voice and vowel acoustics in speech recordings from 12,901 Icelanders. We show how voice pitch and vowel acoustics vary across the life span and correlate with anthropometric, physiological, and cognitive traits. We found that voice pitch and vowel acoustics have a heritable component and discovered correlated common variants in ABCC9 that associate with voice pitch. The ABCC9 variants also associate with adrenal gene expression and cardiovascular traits. By showing that voice and vowel acoustics are influenced by genetics, we have taken important steps toward understanding the genetics and evolution of the human vocal system.


Subject(s)
Speech Acoustics , Voice , Humans , Speech/physiology , Acoustics
2.
Sensors (Basel) ; 22(18)2022 Sep 13.
Article in English | MEDLINE | ID: mdl-36146251

ABSTRACT

Monitoring cognitive workload has the potential to improve both the performance and fidelity of human decision making. However, previous efforts towards discriminating further than binary levels (e.g., low/high or neutral/high) in cognitive workload classification have not been successful. This lack of sensitivity in cognitive workload measurements might be due to individual differences as well as inadequate methodology used to analyse the measured signal. In this paper, a method that combines the speech signal with cardiovascular measurements for screen and heartbeat classification is introduced. For validation, speech and cardiovascular signals from 97 university participants and 20 airline pilot participants were collected while cognitive stimuli of varying difficulty level were induced with the Stroop colour/word test. For the trinary classification scheme (low, medium, high cognitive workload) the prominent result using classifiers trained on each participant achieved 15.17 ± 0.79% and 17.38 ± 1.85% average misclassification rates indicating good discrimination at three levels of cognitive workload. Combining cardiovascular and speech measures synchronized to each heartbeat and consolidated with short-term dynamic measures might therefore provide enhanced sensitivity in cognitive workload monitoring. The results show that the influence of individual differences is a limiting factor for a generic classification and highlights the need for research to focus on methods that incorporate individual differences to achieve even better results. This method can potentially be used to measure and monitor workload in real time in operational environments.


Subject(s)
Voice , Workload , Cognition , Heart Rate , Humans , Speech , Workload/psychology
3.
Biol Psychol ; 132: 154-163, 2018 02.
Article in English | MEDLINE | ID: mdl-29269026

ABSTRACT

Cardiovascular measures have been found to be sensitive to task onset and offset, but are less sensitive to adjacent levels of increasing cognitive workload. A potential confound in the literature is the disregard of individual differences in cardiovascular reactivity. In particular, the individuals' working memory capacity (WMC) is likely to play a role in cardiovascular reactivity to workload. A total of 98 university students performed four cognitive tasks that varied in their level of workload. The operation span (OSPAN) task was used to measure the participants' WMC. A variety of cardiovascular measures were gathered in real time during the experiment. Derived measures of blood pressure regulation were also calculated. In line with what was hypothesized, cardiovascular measures detected workload onset and offset but did not consistently distinguish between the individual task levels. Furthermore, a significant interaction between workload levels and WMC showed that cardiovascular profile varied depending on WMC scores. In addition, WMC negatively predicted subjective ratings of task difficulty as well as task performance, with subjective estimation of task difficulty and error increasing as WMC decreased. The results suggest that WMC may play a critical role in determining how individuals react to increased cognitive workload.


Subject(s)
Adaptation, Physiological/physiology , Cardiovascular Physiological Phenomena , Cognition/physiology , Memory, Short-Term/physiology , Workload/psychology , Adult , Female , Humans , Individuality , Male , Stroop Test , Task Performance and Analysis , Young Adult
4.
IEEE/ACM Trans Audio Speech Lang Process ; 25(12): 2281-2291, 2017 Dec.
Article in English | MEDLINE | ID: mdl-33748320

ABSTRACT

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

5.
IEEE/ACM Trans Audio Speech Lang Process ; 25(8): 1718-1730, 2017 Aug.
Article in English | MEDLINE | ID: mdl-34268444

ABSTRACT

Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical voice assessment. Nonetheless, evaluation of inverse filtering algorithms has been challenging due to the practical difficulties of directly measuring glottal airflow. Apart from this, it is acknowledged that the performance of many methods degrade in voice conditions that are of great interest, such as breathiness, high pitch, soft voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal airflow signals generated by a physiological speech synthesizer. The synthesizer provides a physics-based simulation of the voice production process and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy), fundamental frequencies, and subglottal pressures to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. Waveform errors calculated from glottal flow estimation experiments exhibited mean values around 30% for sustained vowels, and around 40% for continuous speech, of the amplitude of true glottal flow derivative. Closed-phase approaches showed remarkable stability across different voice qualities and subglottal pressures. The algorithms of choice, as suggested by significance tests, are closed-phase covariance analysis for the analysis of sustained vowels, and sparse linear prediction for the analysis of continuous speech. Results of data subset analysis suggest that analysis of close rounded vowels is an additional challenge in glottal flow estimation.

SELECTION OF CITATIONS
SEARCH DETAIL
...