Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 9.523
Filtrar
1.
Sci Rep ; 14(1): 16162, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003348

RESUMEN

The Web has become an essential resource but is not yet accessible to everyone. Assistive technologies and innovative, intelligent frameworks, for example, those using conversational AI, help overcome some exclusions. However, some users still experience barriers. This paper shows how a human-centered approach can shed light on technology limitations and gaps. It reports on a three-step process (focus group, co-design, and preliminary validation) that we adopted to investigate how people with speech impairments, e.g., dysarthria, browse the Web and how barriers can be reduced. The methodology helped us identify challenges and create new solutions, i.e., patterns for Web browsing, by combining voice-based conversational AI, customized for impaired speech, with techniques for the visual augmentation of web pages. While current trends in AI research focus on more and more powerful large models, participants remarked how current conversational systems do not meet their needs, and how it is important to consider each one's specificity for a technology to be called inclusive.


Asunto(s)
Inteligencia Artificial , Internet , Voz , Humanos , Voz/fisiología , Masculino , Femenino , Adulto , Persona de Mediana Edad , Comunicación , Grupos Focales
2.
Hum Brain Mapp ; 45(10): e26724, 2024 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-39001584

RESUMEN

Music is ubiquitous, both in its instrumental and vocal forms. While speech perception at birth has been at the core of an extensive corpus of research, the origins of the ability to discriminate instrumental or vocal melodies is still not well investigated. In previous studies comparing vocal and musical perception, the vocal stimuli were mainly related to speaking, including language, and not to the non-language singing voice. In the present study, to better compare a melodic instrumental line with the voice, we used singing as a comparison stimulus, to reduce the dissimilarities between the two stimuli as much as possible, separating language perception from vocal musical perception. In the present study, 45 newborns were scanned, 10 full-term born infants and 35 preterm infants at term-equivalent age (mean gestational age at test = 40.17 weeks, SD = 0.44) using functional magnetic resonance imaging while listening to five melodies played by a musical instrument (flute) or sung by a female voice. To examine the dynamic task-based effective connectivity, we employed a psychophysiological interaction of co-activation patterns (PPI-CAPs) analysis, using the auditory cortices as seed region, to investigate moment-to-moment changes in task-driven modulation of cortical activity during an fMRI task. Our findings reveal condition-specific, dynamically occurring patterns of co-activation (PPI-CAPs). During the vocal condition, the auditory cortex co-activates with the sensorimotor and salience networks, while during the instrumental condition, it co-activates with the visual cortex and the superior frontal cortex. Our results show that the vocal stimulus elicits sensorimotor aspects of the auditory perception and is processed as a more salient stimulus while the instrumental condition activated higher-order cognitive and visuo-spatial networks. Common neural signatures for both auditory stimuli were found in the precuneus and posterior cingulate gyrus. Finally, this study adds knowledge on the dynamic brain connectivity underlying the newborns capability of early and specialized auditory processing, highlighting the relevance of dynamic approaches to study brain function in newborn populations.


Asunto(s)
Percepción Auditiva , Imagen por Resonancia Magnética , Música , Humanos , Femenino , Masculino , Percepción Auditiva/fisiología , Recién Nacido , Canto/fisiología , Recien Nacido Prematuro/fisiología , Mapeo Encefálico , Estimulación Acústica , Encéfalo/fisiología , Encéfalo/diagnóstico por imagen , Voz/fisiología
3.
J Acoust Soc Am ; 156(1): 278-283, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38980102

RESUMEN

How we produce and perceive voice is constrained by laryngeal physiology and biomechanics. Such constraints may present themselves as principal dimensions in the voice outcome space that are shared among speakers. This study attempts to identify such principal dimensions in the voice outcome space and the underlying laryngeal control mechanisms in a three-dimensional computational model of voice production. A large-scale voice simulation was performed with parametric variations in vocal fold geometry and stiffness, glottal gap, vocal tract shape, and subglottal pressure. Principal component analysis was applied to data combining both the physiological control parameters and voice outcome measures. The results showed three dominant dimensions accounting for at least 50% of the total variance. The first two dimensions describe respiratory-laryngeal coordination in controlling the energy balance between low- and high-frequency harmonics in the produced voice, and the third dimension describes control of the fundamental frequency. The dominance of these three dimensions suggests that voice changes along these principal dimensions are likely to be more consistently produced and perceived by most speakers than other voice changes, and thus are more likely to have emerged during evolution and be used to convey important personal information, such as emotion and larynx size.


Asunto(s)
Laringe , Fonación , Análisis de Componente Principal , Humanos , Fenómenos Biomecánicos , Laringe/fisiología , Laringe/anatomía & histología , Voz/fisiología , Pliegues Vocales/fisiología , Pliegues Vocales/anatomía & histología , Simulación por Computador , Calidad de la Voz , Acústica del Lenguaje , Presión , Modelos Biológicos , Modelos Anatómicos
4.
Sci Data ; 11(1): 746, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982093

RESUMEN

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Asunto(s)
Habla , Voz , Humanos , Periodo Posoperatorio , Tonsilectomía , Masculino , Femenino , Periodo Preoperatorio , Adulto
5.
Codas ; 36(5): e20240009, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39046026

RESUMEN

PURPOSE: The study aimed to identify (1) whether the age and gender of listeners and the length of vocal stimuli affect emotion discrimination accuracy in voice; and (2) whether the determined level of expression of perceived affective emotions is age and gender-dependent. METHODS: Thirty-two age-matched listeners listened to 270 semantically neutral voice samples produced in neutral, happy, and angry intonation by ten professional actors. The participants were required to categorize the auditory stimulus based on three options and judge the intensity of emotional expression in the sample using a customized tablet web interface. RESULTS: The discrimination accuracy of happy and angry emotions decreased with age, while accuracy in discriminating neutral emotions increased with age. Females rated the intensity level of perceived affective emotions higher than males across all linguistic units. These were: for angry emotions in words (z = -3.599, p < .001), phrases (z = -3.218, p = .001), and texts (z = -2.272, p = .023), for happy emotions in words (z = -5.799, p < .001), phrases (z = -4.706, p < .001), and texts (z = -2.699, p = .007). CONCLUSION: Accuracy in perceiving vocal expressions of emotions varies according to age and gender. Young adults are better at distinguishing happy and angry emotions than middle-aged adults, while middle-aged adults tend to categorize perceived affective emotions as neutral. Gender also plays a role, with females rating expressions of affective emotions in voices higher than males. Additionally, the length of voice stimuli impacts emotion discrimination accuracy.


Asunto(s)
Emociones , Percepción del Habla , Voz , Humanos , Femenino , Masculino , Adulto , Emociones/fisiología , Factores de Edad , Adulto Joven , Factores Sexuales , Persona de Mediana Edad , Percepción del Habla/fisiología , Voz/fisiología , Adolescente , Anciano
6.
Sci Rep ; 14(1): 16462, 2024 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-39014043

RESUMEN

The current study tested the hypothesis that the association between musical ability and vocal emotion recognition skills is mediated by accuracy in prosody perception. Furthermore, it was investigated whether this association is primarily related to musical expertise, operationalized by long-term engagement in musical activities, or musical aptitude, operationalized by a test of musical perceptual ability. To this end, we conducted three studies: In Study 1 (N = 85) and Study 2 (N = 93), we developed and validated a new instrument for the assessment of prosodic discrimination ability. In Study 3 (N = 136), we examined whether the association between musical ability and vocal emotion recognition was mediated by prosodic discrimination ability. We found evidence for a full mediation, though only in relation to musical aptitude and not in relation to musical expertise. Taken together, these findings suggest that individuals with high musical aptitude have superior prosody perception skills, which in turn contribute to their vocal emotion recognition skills. Importantly, our results suggest that these benefits are not unique to musicians, but extend to non-musicians with high musical aptitude.


Asunto(s)
Aptitud , Emociones , Música , Humanos , Música/psicología , Masculino , Femenino , Emociones/fisiología , Aptitud/fisiología , Adulto , Adulto Joven , Percepción del Habla/fisiología , Percepción Auditiva/fisiología , Adolescente , Reconocimiento en Psicología/fisiología , Voz/fisiología
7.
Sci Data ; 11(1): 800, 2024 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-39030186

RESUMEN

This paper describes a new publicly-available database of VOiCe signals acquired in Amyotrophic Lateral Sclerosis (ALS) patients (VOC-ALS) and healthy controls performing different speech tasks. This dataset consists of 1224 voice signals recorded from 153 participants: 51 healthy controls (32 males and 19 females) and 102 ALS patients (65 males and 37 females) with different severity of dysarthria. Each subject's voice was recorded using a smartphone application (Vox4Health) while performing several vocal tasks, including a sustained phonation of the vowels /a/, /e/, /i/, /o/, /u/ and /pa/, /ta/, /ka/ syllable repetition. Basic derived speech metrics such as harmonics-to-noise ratio, mean and standard deviation of fundamental frequency (F0), jitter and shimmer were calculated. The F0 standard deviation of vowels and syllables showed an excellent ability to identify people with ALS and to discriminate the different severity of dysarthria. These data represent the most comprehensive database of voice signals in ALS and form a solid basis for research on the recognition of voice impairment in ALS patients for use in clinical applications.


Asunto(s)
Esclerosis Amiotrófica Lateral , Disartria , Humanos , Esclerosis Amiotrófica Lateral/fisiopatología , Esclerosis Amiotrófica Lateral/complicaciones , Disartria/fisiopatología , Masculino , Femenino , Voz , Bases de Datos Factuales , Persona de Mediana Edad , Adulto , Anciano , Estudios de Casos y Controles
8.
Cognition ; 250: 105866, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38971020

RESUMEN

Language experience confers a benefit to voice learning, a concept described in the literature as the language familiarity effect (LFE). What experiences are necessary for the LFE to be conferred is less clear. We contribute empirically and theoretically to this debate by examining within and across language voice learning with Cantonese-English bilingual voices in a talker-voice association paradigm. Listeners were trained in Cantonese or English and assessed on their abilities to generalize voice learning at test on Cantonese and English utterances. By testing listeners from four language backgrounds - English Monolingual, Cantonese-English Multilingual, Tone Multilingual, and Non-tone Multilingual groups - we assess whether the LFE and group-level differences in voice learning are due to varying abilities (1) in accessing the relative acoustic-phonetic features that distinguish a voice, (2) learning at a given rate, or (3) generalizing learning of talker-voice associations to novel same-language and different-language utterances. The specific four language background groups allow us to investigate the roles of language-specific familiarity, tone language experience, and generic multilingual experience in voice learning. Differences in performance across listener groups shows evidence in support of the LFE and the role of two mechanisms for voice learning: the extraction and association of talker-specific, language-general information that is more robustly generalized across languages, and talker-specific, language-specific information that may be more readily accessible and learnable, but due to its language-specific nature, is less able to be extended to another language.


Asunto(s)
Aprendizaje , Multilingüismo , Percepción del Habla , Voz , Humanos , Voz/fisiología , Percepción del Habla/fisiología , Femenino , Masculino , Aprendizaje/fisiología , Adulto , Adulto Joven , Lenguaje , Reconocimiento en Psicología/fisiología , Fonética
9.
Sci Rep ; 14(1): 16778, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39039258

RESUMEN

The present study employed dictator game and ultimatum game to investigate the effect of facial attractiveness, vocal attractiveness and social interest in expressing positive ("I like you") versus negative signals ("I don't like you") on decision making. Female participants played against male recipients in dictator game and ultimatum game while played against male proposers in ultimatum game. Results showed that participants offered recipients with attractive faces more money than recipients with unattractive faces. Participants also offered recipients with attractive voices more money than recipients with unattractive voices, especially under the positive social interest condition. Moreover, participants allocated more money to recipients who expressed positive social interest than those who expressed negative social interest, whereas they would also expect proposers who expressed positive social interest to offer them more money than proposers who expressed negative social interest. Overall, the results inform beauty premium for faces and voices on opposite-sex economic bargaining. Social interest also affects decision outcomes. However, the beauty premium and effect of social interest varies with participants' roles.


Asunto(s)
Belleza , Toma de Decisiones , Cara , Voz , Humanos , Femenino , Masculino , Adulto Joven , Adulto , Juegos Experimentales
10.
Commun Biol ; 7(1): 711, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38862808

RESUMEN

Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.


Asunto(s)
Percepción del Habla , Humanos , Masculino , Femenino , Adulto , Adulto Joven , Percepción del Habla/fisiología , Red Nerviosa/fisiología , Corteza Auditiva/fisiología , Voz/fisiología , Cuerpo Estriado/fisiología
11.
Sci Rep ; 14(1): 13813, 2024 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-38877028

RESUMEN

Parkinson's Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson's Disease.


Asunto(s)
Aprendizaje Automático , Enfermedad de Parkinson , Enfermedad de Parkinson/diagnóstico , Humanos , Masculino , Femenino , Persona de Mediana Edad , Anciano , Redes Neurales de la Computación , Voz , Aprendizaje Profundo
12.
Physiol Behav ; 283: 114615, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-38880296

RESUMEN

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.


Asunto(s)
Señales (Psicología) , Hidrocortisona , Saliva , Percepción del Habla , Habla , Testosterona , Voz , Humanos , Testosterona/metabolismo , Testosterona/farmacología , Masculino , Adulto , Adulto Joven , Saliva/metabolismo , Saliva/química , Hidrocortisona/metabolismo , Percepción del Habla/fisiología , Percepción del Habla/efectos de los fármacos , Habla/fisiología , Habla/efectos de los fármacos , Voz/efectos de los fármacos , Femenino , Belleza , Estimulación Acústica
13.
J Speech Lang Hear Res ; 67(7): 2139-2158, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38875480

RESUMEN

PURPOSE: This systematic review aimed to evaluate the effects of singing as an intervention for aging voice. METHOD: Quantitative studies of interventions for older adults with any medical condition that involves singing as training were reviewed, measured by respiration, phonation, and posture, which are the physical functions related to the aging voice. English and Chinese studies published until April 2024 were searched using 31 electronic databases, and seven studies were included. The included articles were assessed according to the Grading of Recommendations, Assessment, Development, and Evaluations rubric. RESULTS: Seven studies were included. These studies reported outcome measures that were related to respiratory functions only. For the intervention effect, statistically significant improvements were observed in five of the included studies, among which three studies had large effect sizes. The overall level of evidence of the included studies was not high, with three studies having moderate levels and the rest having lower levels. The intervention activities included trainings other than singing. These non-singing training items may have caused co-intervention bias in the study results. CONCLUSIONS: This systematic review suggests that singing as an intervention for older adults with respiratory and cognitive problems could improve respiration and respiratory-phonatory control. However, none of the included studies covers the other two of the physical functions related to aging voice (phonatory and postural functions). The overall level of evidence of the included studies was not high either. There is a need for more research evidence in singing-based intervention specifically for patient with aging voice.


Asunto(s)
Envejecimiento , Canto , Humanos , Anciano , Envejecimiento/fisiología , Trastornos de la Voz/terapia , Fonación/fisiología , Calidad de la Voz , Voz/fisiología , Respiración , Postura/fisiología , Anciano de 80 o más Años
14.
Math Biosci Eng ; 21(5): 5947-5971, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38872565

RESUMEN

The technology of robot-assisted prostate seed implantation has developed rapidly. However, during the process, there are some problems to be solved, such as non-intuitive visualization effects and complicated robot control. To improve the intelligence and visualization of the operation process, a voice control technology of prostate seed implantation robot in augmented reality environment was proposed. Initially, the MRI image of the prostate was denoised and segmented. The three-dimensional model of prostate and its surrounding tissues was reconstructed by surface rendering technology. Combined with holographic application program, the augmented reality system of prostate seed implantation was built. An improved singular value decomposition three-dimensional registration algorithm based on iterative closest point was proposed, and the results of three-dimensional registration experiments verified that the algorithm could effectively improve the three-dimensional registration accuracy. A fusion algorithm based on spectral subtraction and BP neural network was proposed. The experimental results showed that the average delay of the fusion algorithm was 1.314 s, and the overall response time of the integrated system was 1.5 s. The fusion algorithm could effectively improve the reliability of the voice control system, and the integrated system could meet the responsiveness requirements of prostate seed implantation.


Asunto(s)
Algoritmos , Realidad Aumentada , Imagen por Resonancia Magnética , Redes Neurales de la Computación , Próstata , Neoplasias de la Próstata , Robótica , Humanos , Masculino , Robótica/instrumentación , Imagen por Resonancia Magnética/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Próstata/diagnóstico por imagen , Imagenología Tridimensional , Voz , Procedimientos Quirúrgicos Robotizados/instrumentación , Procedimientos Quirúrgicos Robotizados/métodos , Holografía/métodos , Holografía/instrumentación , Braquiterapia/instrumentación , Reproducibilidad de los Resultados
15.
J Acoust Soc Am ; 155(6): 3822-3832, 2024 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-38874464

RESUMEN

This study proposes the use of vocal resonators to enhance cardiac auscultation signals and evaluates their performance for voice-noise suppression. Data were collected using two electronic stethoscopes while each study subject was talking. One collected auscultation signal from the chest while the other collected voice signals from one of the three voice resonators (cheek, back of the neck, and shoulder). The spectral subtraction method was applied to the signals. Both objective and subjective metrics were used to evaluate the quality of enhanced signals and to investigate the most effective vocal resonator for noise suppression. Our preliminary findings showed a significant improvement after enhancement and demonstrated the efficacy of vocal resonators. A listening survey was conducted with thirteen physicians to evaluate the quality of enhanced signals, and they have received significantly better scores regarding the sound quality than their original signals. The shoulder resonator group demonstrated significantly better sound quality than the cheek group when reducing voice sound in cardiac auscultation signals. The suggested method has the potential to be used for the development of an electronic stethoscope with a robust noise removal function. Significant clinical benefits are expected from the expedited preliminary diagnostic procedure.


Asunto(s)
Auscultación Cardíaca , Procesamiento de Señales Asistido por Computador , Estetoscopios , Humanos , Auscultación Cardíaca/instrumentación , Auscultación Cardíaca/métodos , Auscultación Cardíaca/normas , Masculino , Femenino , Adulto , Ruidos Cardíacos/fisiología , Espectrografía del Sonido , Diseño de Equipo , Voz/fisiología , Persona de Mediana Edad , Calidad de la Voz , Vibración , Ruido
16.
Sci Rep ; 14(1): 12734, 2024 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-38830969

RESUMEN

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.


Asunto(s)
Depresión , Voz , Humanos , Depresión/diagnóstico , Masculino , Femenino , Inteligencia Artificial , Adulto
17.
J Matern Fetal Neonatal Med ; 37(1): 2362933, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38910112

RESUMEN

OBJECTIVE: To study the effects of playing mother's recorded voice to preterm infants in the NICU on their mothers' mental health as measured by the Depression, Anxiety and Stress Scale -21 (DASS-21) questionnaire. DESIGN/METHODS: This was a pilot single center prospective randomized controlled trial done at a level IV NICU. The trial was registered at clinicaltrials.gov (NCT04559620). Inclusion criteria were mothers of preterm infants with gestational ages between 26wks and 30 weeks. DASS-21 questionnaire was administered to all the enrolled mothers in the first week after birth followed by recording of their voice by the music therapists. In the interventional group, recorded maternal voice was played into the infant incubator between 15 and 21 days of life. A second DASS-21 was administered between 21 and 23 days of life. The Wilcoxon rank-sum test was used to compare DASS-21 scores between the two groups and Wilcoxon signed-rank test was used to compare the pre- and post-intervention DASS-21 scores. RESULTS: Forty eligible mothers were randomized: 20 to the intervention group and 20 to the control group. The baseline maternal and neonatal characteristics were similar between the two groups. There was no significant difference in the DASS-21 scores between the two groups at baseline or after the study intervention. There was no difference in the pre- and post-interventional DASS-21 scores or its individual components in the experimental group. There was a significant decrease in the total DASS-21 score and the anxiety component of DASS-21 between weeks 1 and 4 in the control group. CONCLUSION: In this pilot randomized control study, recorded maternal voice played into preterm infant's incubator did not have any effect on maternal mental health as measured by the DASS-21 questionnaire. Data obtained in this pilot study are useful in future RCTs (Randomized Controlled Trial) to address this important issue.


Asunto(s)
Ansiedad , Depresión , Recien Nacido Prematuro , Estrés Psicológico , Humanos , Femenino , Proyectos Piloto , Recién Nacido , Recien Nacido Prematuro/psicología , Ansiedad/terapia , Adulto , Estrés Psicológico/terapia , Depresión/terapia , Madres/psicología , Incubadoras para Lactantes , Estudios Prospectivos , Musicoterapia/métodos , Voz/fisiología
18.
Eur J Psychotraumatol ; 15(1): 2358681, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38837122

RESUMEN

Background: Research has shown that potential perpetrators and individuals high in psychopathic traits tend to body language cues to target a potential new victim. However, whether targeting occurs also by tending to vocal cues has not been examined. Thus, the role of voice in interpersonal violence merits investigation.Objective: In two studies, we examined whether perpetrators could differentiate female speakers with and without sexual and physical assault histories (presented as rating the degree of 'vulnerability' to victimization).Methods: Two samples of male listeners (sample one N = 105, sample two, N = 109) participated. Each sample rated 18 voices (9 survivors and 9 controls). Listener sample one heard spontaneous speech, and listener sample two heard the second sentence of a standardized passage. Listeners' self-reported psychopathic traits and history of previous perpetration were measured.Results: Across both samples, history of perpetration (but not psychopathy) predicted accuracy in distinguishing survivors of assault.Conclusions: These findings highlight the potential role of voice in prevention and intervention. Gaining a further understanding of what voice cues are associated with accuracy in discerning survivors can also help us understand whether or not specialized voice training could have a role in self-defense practices.


We examined whether listeners with history of perpetration could differentiate female speakers with and without assault histories (presented as rating the degree of 'vulnerability' to victimization).Listeners' higher history of perpetration was associated with higher accuracy in differentiating survivors of assault from non-survivors.These findings highlight that voice could have a crucial role in prevention and intervention.


Asunto(s)
Sobrevivientes , Voz , Humanos , Masculino , Femenino , Adulto , Sobrevivientes/psicología , Señales (Psicología) , Víctimas de Crimen/psicología , Persona de Mediana Edad
19.
JASA Express Lett ; 4(6)2024 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-38888432

RESUMEN

Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.


Asunto(s)
Canto , Percepción del Habla , Humanos , Masculino , Femenino , Adulto , Percepción del Habla/fisiología , Voz , Adulto Joven , Reconocimiento en Psicología , Habla
20.
Sci Rep ; 14(1): 14575, 2024 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-38914752

RESUMEN

People often interact with groups (i.e., ensembles) during social interactions. Given that group-level information is important in navigating social environments, we expect perceptual sensitivity to aspects of groups that are relevant for personal threat as well as social belonging. Most ensemble perception research has focused on visual ensembles, with little research looking at auditory or vocal ensembles. Across four studies, we present evidence that (i) perceivers accurately extract the sex composition of a group from voices alone, (ii) judgments of threat increase concomitantly with the number of men, and (iii) listeners' sense of belonging depends on the number of same-sex others in the group. This work advances our understanding of social cognition, interpersonal communication, and ensemble coding to include auditory information, and reveals people's ability to extract relevant social information from brief exposures to vocalizing groups.


Asunto(s)
Voz , Humanos , Masculino , Femenino , Adulto , Razón de Masculinidad , Percepción Social , Adulto Joven , Percepción Auditiva/fisiología , Relaciones Interpersonales , Interacción Social
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...