Pesquisa | Portal Regional da BVS

1.

Voice-Activated Cognitive Behavioral Therapy for Insomnia: A Randomized Clinical Trial.

Starling, Claire M; Greenberg, Daniel; Lewin, Daniel; Shaw, Callen; Zhou, Eric S; Lieberman, Daniel; Chou, Jiling; Arem, Hannah.

JAMA Netw Open ; 7(9): e2435011, 2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-39316400

RESUMO

Importance: Insomnia symptoms affect an estimated 30% to 50% of the 4 million US breast cancer survivors. Previous studies have shown the effectiveness of cognitive behavioral therapy for insomnia (CBT-I), but high insomnia prevalence suggests continued opportunities for delivery via new modalities. Objective: To determine the efficacy of a CBT-I-informed, voice-activated, internet-delivered program for improving insomnia symptoms among breast cancer survivors. Design, Setting, and Participants: In this randomized clinical trial, breast cancer survivors with insomnia (Insomnia Severity Index [ISI] score >7) were recruited from advocacy and survivorship groups and an oncology clinic. Eligible patients were females aged 18 years or older who had completed curative treatment more than 3 months before enrollment and had not undergone other behavioral sleep treatments in the prior year. Individuals were assessed for eligibility and randomized between March 2022 and October 2023, with data collection completed by December 2023. Intervention: Participants were randomized 1:1 to a smart speaker with a voice-interactive CBT-I program or educational control for 6 weeks. Main Outcomes and Measures: Linear mixed models and Cohen d estimates were used to evaluate the primary outcome of changes in ISI scores and secondary outcomes of sleep quality, wake after sleep onset, sleep onset latency, total sleep time, and sleep efficiency. Results: Of 76 women enrolled (38 each in the intervention and control groups), 70 (92.1%) completed the study. Mean (SD) age was 61.2 (9.3) years; 49 (64.5%) were married or partnered, and participants were a mean (SD) of 9.6 (6.8) years from diagnosis. From baseline to follow-up, ISI scores changed by a mean (SD) of -8.4 (4.7) points in the intervention group compared with -2.6 (3.5) in the control group (P < .001) (Cohen d, 1.41; 95% CI, 0.87-1.94). Sleep diary data showed statistically significant improvements in the intervention group compared with the control group for sleep quality (0.56; 95% CI, 0.39-0.74), wake after sleep onset (9.54 minutes; 95% CI, 1.93-17.10 minutes), sleep onset latency (8.32 minutes; 95% CI, 1.91-14.70 minutes), and sleep efficiency (-0.04%; 95% CI, -0.07% to -0.01%) but not for total sleep time (0.01 hours; 95% CI, -0.27 to 0.29 hours). Conclusions and Relevance: This randomized clinical trial of an in-home, voice-activated CBT-I program among breast cancer survivors found that the intervention improved insomnia symptoms. Future studies may explore how this program can be taken to scale and integrated into ambulatory care. Trial Registration: ClinicalTrials.gov Identifier: NCT05233800.

Assuntos

Neoplasias da Mama , Terapia Cognitivo-Comportamental , Distúrbios do Início e da Manutenção do Sono , Humanos , Feminino , Distúrbios do Início e da Manutenção do Sono/terapia , Terapia Cognitivo-Comportamental/métodos , Pessoa de Meia-Idade , Neoplasias da Mama/complicações , Idoso , Sobreviventes de Câncer/psicologia , Resultado do Tratamento , Adulto , Voz

2.

Sensitivity to vocal emotions emerges in newborns at 37 weeks gestational age.

Hou, Xinlin; Zhang, Peng; Mo, Licheng; Peng, Cheng; Zhang, Dandan.

Elife ; 132024 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-39302291

RESUMO

Emotional responsiveness in neonates, particularly their ability to discern vocal emotions, plays an evolutionarily adaptive role in human communication and adaptive behaviors. The developmental trajectory of emotional sensitivity in neonates is crucial for understanding the foundations of early social-emotional functioning. However, the precise onset of this sensitivity and its relationship with gestational age (GA) remain subjects of investigation. In a study involving 120 healthy neonates categorized into six groups based on their GA (ranging from 35 and 40 weeks), we explored their emotional responses to vocal stimuli. These stimuli encompassed disyllables with happy and neutral prosodies, alongside acoustically matched nonvocal control sounds. The assessments occurred during natural sleep states using the odd-ball paradigm and event-related potentials. The results reveal a distinct developmental change at 37 weeks GA, marking the point at which neonates exhibit heightened perceptual acuity for emotional vocal expressions. This newfound ability is substantiated by the presence of the mismatch response, akin to an initial form of adult mismatch negativity, elicited in response to positive emotional vocal prosody. Notably, this perceptual shift's specificity becomes evident when no such discrimination is observed in acoustically matched control sounds. Neonates born before 37 weeks GA do not display this level of discrimination ability. This developmental change has important implications for our understanding of early social-emotional development, highlighting the role of gestational age in shaping early perceptual abilities. Moreover, while these findings introduce the potential for a valuable screening tool for conditions like autism, characterized by atypical social-emotional functions, it is important to note that the current data are not yet robust enough to fully support this application. This study makes a substantial contribution to the broader field of developmental neuroscience and holds promise for future research on early intervention in neurodevelopmental disorders.

Assuntos

Emoções , Idade Gestacional , Humanos , Recém-Nascido , Emoções/fisiologia , Feminino , Masculino , Potenciais Evocados/fisiologia , Estimulação Acústica , Voz/fisiologia , Percepção Auditiva/fisiologia

3.

[Classification of mild cognitive impairment and normal cognition using an automated voice-based testing application].

Asano, Takayuki; Yasuda, Asako; Kinoshita, Setsuo; Tanaka, Toshiro; Sahara, Toru; Tanaka, Toshimitsu; Homma, Akira; Shigeta, Masahiro.

Nihon Ronen Igakkai Zasshi ; 61(3): 337-344, 2024.

Artigo em Japonês | MEDLINE | ID: mdl-39261104

RESUMO

AIM: An easy-to-use tool that can detect cognitive decline in mild cognitive impairment (MCI) is required. In this study, we aimed to construct a machine learning model that discriminates between MCI and cognitively normal (CN) individuals using spoken answers to questions and speech features. METHODS: Participants of ≥50 years of age were recruited from the Silver Human Resource Center. The Japanese Version of the Mini-Mental State Examination (MMSE-J) and Clinical Dementia Rating (CDR) were used to obtain clinical information. We developed a research application that presented neuropsychological tasks via automated voice guidance and collected the participants' spoken answers. The neuropsychological tasks included time orientation, sentence memory tasks (immediate and delayed recall), and digit span memory-updating tasks. Scores and speech features were obtained from spoken answers. Subsequently, a machine learning model was constructed to classify MCI and CN using various classifiers, combining the participants' age, gender, scores, and speech features. RESULTS: We obtained a model using Gaussian Naive Bayes, which classified typical MCI (CDR 0.5, MMSE ≤26) and typical CN (CDR 0 and MMSE ≥29) with an area under the curve (AUC) of 0.866 (accuracy 0.75, sensitivity 0.857, specificity 0.712). CONCLUSIONS: We built a machine learning model that can classify MCI and CN using spoken answers to neuropsychological questions. Easy-to-use MCI detection tools could be developed by incorporating this model into smartphone applications and telephone services.

Assuntos

Disfunção Cognitiva , Humanos , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/classificação , Idoso , Masculino , Feminino , Pessoa de Meia-Idade , Voz , Cognição , Testes Neuropsicológicos , Idoso de 80 Anos ou mais , Aprendizado de Máquina

4.

Operatic voices engage the default mode network in professional opera singers.

Bihari, Adél; Nárai, Ádám; Kleber, Boris; Zsuga, Judit; Hermann, Petra; Vidnyánszky, Zoltán.

Sci Rep ; 14(1): 21313, 2024 09 12.

Artigo em Inglês | MEDLINE | ID: mdl-39266561

RESUMO

Extensive research with musicians has shown that instrumental musical training can have a profound impact on how acoustic features are processed in the brain. However, less is known about the influence of singing training on neural activity during voice perception, particularly in response to salient acoustic features, such as the vocal vibrato in operatic singing. To address this gap, the present study employed functional magnetic resonance imaging (fMRI) to measure brain responses in trained opera singers and musically untrained controls listening to recordings of opera singers performing in two distinct styles: a full operatic voice with vibrato, and a straight voice without vibrato. Results indicated that for opera singers, perception of operatic voice led to differential fMRI activations in bilateral auditory cortical regions and the default mode network. In contrast, musically untrained controls exhibited differences only in bilateral auditory cortex. These results suggest that operatic singing training triggers experience-dependent neural changes in the brain that activate self-referential networks, possibly through embodiment of acoustic features associated with one's own singing style.

Assuntos

Imageamento por Ressonância Magnética , Canto , Humanos , Canto/fisiologia , Masculino , Feminino , Adulto , Adulto Jovem , Percepção Auditiva/fisiologia , Música , Rede de Modo Padrão/fisiologia , Córtex Auditivo/fisiologia , Córtex Auditivo/diagnóstico por imagem , Voz/fisiologia , Mapeamento Encefálico , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem

5.

COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset.

Idrisoglu, Alper; Dallora, Ana Luiza; Cheddad, Abbas; Anderberg, Peter; Jakobsson, Andreas; Sanmartin Berglund, Johan.

Artif Intell Med ; 156: 102953, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39222579

RESUMO

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages. OBJECTIVE: The proposed study aims to explore whether the voice features extracted from the vowel "a" utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset. METHODS: Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel "a" utterance commenced following an information and consent meeting with each participant using the VoiceDiagnostic application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures. RESULTS: The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order. CONCLUSION: This study concludes that the utterance of vowel "a" recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis. CLINICAL TRIAL REGISTRATION NUMBER: NCT05897944.

Assuntos

Aprendizado de Máquina , Doença Pulmonar Obstrutiva Crônica , Doença Pulmonar Obstrutiva Crônica/classificação , Doença Pulmonar Obstrutiva Crônica/fisiopatologia , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Humanos , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Voz/fisiologia , Máquina de Vetores de Suporte

6.

Functional alterations of lateral temporal cortex for processing voice prosody in adults with autism spectrum disorder.

Hashimoto, Ryu-Ichiro; Okada, Rieko; Aoki, Ryuta; Nakamura, Motoaki; Ohta, Haruhisa; Itahashi, Takashi.

Cereb Cortex ; 34(9)2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-39270675

RESUMO

The human auditory system includes discrete cortical patches and selective regions for processing voice information, including emotional prosody. Although behavioral evidence indicates individuals with autism spectrum disorder (ASD) have difficulties in recognizing emotional prosody, it remains understudied whether and how localized voice patches (VPs) and other voice-sensitive regions are functionally altered in processing prosody. This fMRI study investigated neural responses to prosodic voices in 25 adult males with ASD and 33 controls using voices of anger, sadness, and happiness with varying degrees of emotion. We used a functional region-of-interest analysis with an independent voice localizer to identify multiple VPs from combined ASD and control data. We observed a general response reduction to prosodic voices in specific VPs of left posterior temporal VP (TVP) and right middle TVP. Reduced cortical responses in right middle TVP were consistently correlated with the severity of autistic symptoms for all examined emotional prosodies. Moreover, representation similarity analysis revealed the reduced effect of emotional intensity in multivoxel activation patterns in left anterior superior temporal cortex only for sad prosody. These results indicate reduced response magnitudes to voice prosodies in specific TVPs and altered emotion intensity-dependent multivoxel activation patterns in adult ASDs, potentially underlying their socio-communicative difficulties.

Assuntos

Transtorno do Espectro Autista , Emoções , Imageamento por Ressonância Magnética , Lobo Temporal , Voz , Humanos , Masculino , Transtorno do Espectro Autista/fisiopatologia , Transtorno do Espectro Autista/diagnóstico por imagem , Transtorno do Espectro Autista/psicologia , Lobo Temporal/fisiopatologia , Lobo Temporal/diagnóstico por imagem , Adulto , Emoções/fisiologia , Adulto Jovem , Percepção da Fala/fisiologia , Mapeamento Encefálico/métodos , Estimulação Acústica , Percepção Auditiva/fisiologia

7.

Synthetic, self-oscillating vocal fold models for voice production researcha).

Thomson, Scott L.

J Acoust Soc Am ; 156(2): 1283-1308, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39172710

RESUMO

Sound for the human voice is produced by vocal fold flow-induced vibration and involves a complex coupling between flow dynamics, tissue motion, and acoustics. Over the past three decades, synthetic, self-oscillating vocal fold models have played an increasingly important role in the study of these complex physical interactions. In particular, two types of models have been established: "membranous" vocal fold models, such as a water-filled latex tube, and "elastic solid" models, such as ultrasoft silicone formed into a vocal fold-like shape and in some cases with multiple layers of differing stiffness to mimic the human vocal fold tissue structure. In this review, the designs, capabilities, and limitations of these two types of models are presented. Considerations unique to the implementation of elastic solid models, including fabrication processes and materials, are discussed. Applications in which these models have been used to study the underlying mechanical principles that govern phonation are surveyed, and experimental techniques and configurations are reviewed. Finally, recommendations for continued development of these models for even more lifelike response and clinical relevance are summarized.

Assuntos

Fonação , Vibração , Prega Vocal , Prega Vocal/fisiologia , Prega Vocal/anatomia & histologia , Humanos , Modelos Anatômicos , Fenômenos Biomecânicos , Voz/fisiologia , Elasticidade , Modelos Biológicos

8.

Detection of Mild Cognitive Impairment From Non-Semantic, Acoustic Voice Features: The Framingham Heart Study.

Ding, Huitong; Lister, Adrian; Karjadi, Cody; Au, Rhoda; Lin, Honghuang; Bischoff, Brian; Hwang, Phillip H.

JMIR Aging ; 7: e55126, 2024 Aug 22.

Artigo em Inglês | MEDLINE | ID: mdl-39173144

RESUMO

BACKGROUND: With the aging global population and the rising burden of Alzheimer disease and related dementias (ADRDs), there is a growing focus on identifying mild cognitive impairment (MCI) to enable timely interventions that could potentially slow down the onset of clinical dementia. The production of speech by an individual is a cognitively complex task that engages various cognitive domains. The ease of audio data collection highlights the potential cost-effectiveness and noninvasive nature of using human speech as a tool for cognitive assessment. OBJECTIVE: This study aimed to construct a machine learning pipeline that incorporates speaker diarization, feature extraction, feature selection, and classification to identify a set of acoustic features derived from voice recordings that exhibit strong MCI detection capability. METHODS: The study included 100 MCI cases and 100 cognitively normal controls matched for age, sex, and education from the Framingham Heart Study. Participants' spoken responses on neuropsychological tests were recorded, and the recorded audio was processed to identify segments of each participant's voice from recordings that included voices of both testers and participants. A comprehensive set of 6385 acoustic features was then extracted from these voice segments using OpenSMILE and Praat software. Subsequently, a random forest model was constructed to classify cognitive status using the features that exhibited significant differences between the MCI and cognitively normal groups. The MCI detection performance of various audio lengths was further examined. RESULTS: An optimal subset of 29 features was identified that resulted in an area under the receiver operating characteristic curve of 0.87, with a 95% CI of 0.81-0.94. The most important acoustic feature for MCI classification was the number of filled pauses (importance score=0.09, P=3.10E-08). There was no substantial difference in the performance of the model trained on the acoustic features derived from different lengths of voice recordings. CONCLUSIONS: This study showcases the potential of monitoring changes to nonsemantic and acoustic features of speech as a way of early ADRD detection and motivates future opportunities for using human speech as a measure of brain health.

Assuntos

Disfunção Cognitiva , Humanos , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/fisiopatologia , Feminino , Masculino , Idoso , Voz/fisiologia , Aprendizado de Máquina , Testes Neuropsicológicos , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Acústica da Fala

9.

Longitudinal Coadaptation of Older Adults With Wearables and Voice-Activated Virtual Assistants: Scoping Review.

Kokorelias, Kristina Marie; Grigorovich, Alisa; Harris, Maurita T; Rehman, Umair; Ritchie, Louise; Levy, AnneMarie M; Denecke, Kerstin; McMurray, Josephine.

J Med Internet Res ; 26: e57258, 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-39110963

RESUMO

BACKGROUND: The integration of smart technologies, including wearables and voice-activated devices, is increasingly recognized for enhancing the independence and well-being of older adults. However, the long-term dynamics of their use and the coadaptation process with older adults remain poorly understood. This scoping review explores how interactions between older adults and smart technologies evolve over time to improve both user experience and technology utility. OBJECTIVE: This review synthesizes existing research on the coadaptation between older adults and smart technologies, focusing on longitudinal changes in use patterns, the effectiveness of technological adaptations, and the implications for future technology development and deployment to improve user experiences. METHODS: Following the Joanna Briggs Institute Reviewer's Manual and PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, this scoping review examined peer-reviewed papers from databases including Ovid MEDLINE, Ovid Embase, PEDro, Ovid PsycINFO, and EBSCO CINAHL from the year 2000 to August 28, 2023, and included forward and backward searches. The search was updated on March 1, 2024. Empirical studies were included if they involved (1) individuals aged 55 years or older living independently and (2) focused on interactions and adaptations between older adults and wearables and voice-activated virtual assistants in interventions for a minimum period of 8 weeks. Data extraction was informed by the selection and optimization with compensation framework and the sex- and gender-based analysis plus theoretical framework and used a directed content analysis approach. RESULTS: The search yielded 16,143 papers. Following title and abstract screening and a full-text review, 5 papers met the inclusion criteria. Study populations were mostly female participants and aged 73-83 years from the United States and engaged with voice-activated virtual assistants accessed through smart speakers and wearables. Users frequently used simple commands related to music and weather, integrating devices into daily routines. However, communication barriers often led to frustration due to devices' inability to recognize cues or provide personalized responses. The findings suggest that while older adults can integrate smart technologies into their lives, a lack of customization and user-friendly interfaces hinder long-term adoption and satisfaction. The studies highlight the need for technology to be further developed so they can better meet this demographic's evolving needs and call for research addressing small sample sizes and limited diversity. CONCLUSIONS: Our findings highlight a critical need for continued research into the dynamic and reciprocal relationship between smart technologies and older adults over time. Future studies should focus on more diverse populations and extend monitoring periods to provide deeper insights into the coadaptation process. Insights gained from this review are vital for informing the development of more intuitive, user-centric smart technology solutions to better support the aging population in maintaining independence and enhancing their quality of life. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/51129.

Assuntos

Dispositivos Eletrônicos Vestíveis , Humanos , Idoso , Pessoa de Meia-Idade , Feminino , Masculino , Idoso de 80 Anos ou mais , Voz , Estudos Longitudinais

10.

The influence of affective voice on sound distance perception.

Kroczek, Leon O H; Roßkopf, Sarah; Stärz, Felix; Blau, Matthias; van de Par, Steven; Mühlberger, Andreas.

J Exp Psychol Hum Percept Perform ; 50(9): 918-933, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39101929

RESUMO

Affective stimuli in our environment indicate reward or threat and thereby relate to approach and avoidance behavior. Previous findings suggest that affective stimuli may bias visual perception, but it remains unclear whether similar biases exist in the auditory domain. Therefore, we asked whether affective auditory voices (angry vs. neutral) influence sound distance perception. Two VR experiments (data collection 2021-2022) were conducted in which auditory stimuli were presented via loudspeakers located at positions unknown to the participants. In the first experiment (N = 44), participants actively placed a visually presented virtual agent or virtual loudspeaker in an empty room at the perceived sound source location. In the second experiment (N = 32), participants were standing in front of several virtual agents or virtual loudspeakers and had to indicate the sound source by directing their gaze toward the perceived sound location. Results in both preregistered experiments consistently showed that participants estimated the location of angry voice stimuli at greater distances than the location of neutral voice stimuli. We discuss that neither emotional nor motivational biases can account for these results. Instead, distance estimates seem to rely on listeners' representations regarding the relationship between vocal affect and acoustic characteristics. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Assuntos

Afeto , Humanos , Adulto , Feminino , Masculino , Adulto Jovem , Afeto/fisiologia , Percepção de Distância/fisiologia , Localização de Som/fisiologia , Voz/fisiologia , Realidade Virtual , Ira/fisiologia , Percepção Auditiva/fisiologia

11.

Impact of interference on vocal and instrument recognition.

Bürgel, Michel; Siedenburg, Kai.

J Acoust Soc Am ; 156(2): 922-938, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39133041

RESUMO

Voices arguably occupy a superior role in auditory processing. Specifically, studies have reported that singing voices are processed faster and more accurately and possess greater salience in musical scenes compared to instrumental sounds. However, the underlying acoustic features of this superiority and the generality of these effects remain unclear. This study investigates the impact of frequency micro-modulations (FMM) and the influence of interfering sounds on sound recognition. Thirty young participants, half with musical training, engage in three sound recognition experiments featuring short vocal and instrumental sounds in a go/no-go task. Accuracy and reaction times are measured for sounds from recorded samples and excerpts of popular music. Each sound is presented in separate versions with and without FMM, in isolation or accompanied by a piano. Recognition varies across sound categories, but no general vocal superiority emerges and no effects of FMM. When presented together with interfering sounds, all sounds exhibit degradation in recognition. However, whereas /a/ sounds stand out by showing a distinct robustness to interference (i.e., less degradation of recognition), /u/ sounds lack this robustness. Acoustical analysis implies that recognition differences can be explained by spectral similarities. Together, these results challenge the notion of general vocal superiority in auditory perception.

Assuntos

Estimulação Acústica , Percepção Auditiva , Música , Reconhecimento Psicológico , Humanos , Masculino , Feminino , Adulto Jovem , Adulto , Estimulação Acústica/métodos , Percepção Auditiva/fisiologia , Tempo de Reação , Canto , Voz/fisiologia , Adolescente , Espectrografia do Som , Qualidade da Voz

12.

Linear effects of glucose levels on voice fundamental frequency in type 2 diabetes and individuals with normoglycemia.

Kaufman, Jaycee; Jeon, Jouhyun; Oreskovic, Jessica; Fossat, Yan.

Sci Rep ; 14(1): 19012, 2024 08 28.

Artigo em Inglês | MEDLINE | ID: mdl-39198592

RESUMO

Glucose levels in the body have been hypothesized to affect voice characteristics. One of the primary justifications for voice changes are due to Hooke's law, in which a variation in the tension, mass, or length of the vocal folds, mediated by the body's glucose levels, results in an alteration in their vibrational frequency. To explore this hypothesis, 505 participants were fitted with a continuous glucose monitor (CGM) and instructed to record their voice using a custom mobile application up to six times daily for 2 weeks. Glucose values from CGM were paired to voice recordings to create a sampled dataset that closely resembled the glucose profile of the comprehensive CGM dataset. Glucose levels and fundamental frequency (F0) had a significant positive association within an individual, and a 1 mg/dL increase in CGM recorded glucose corresponded to a 0.02 Hz increase in F0 (CI 0.01-0.03 Hz, P < 0.001). This effect was also observed when the participants were split into non-diabetic, prediabetic, and Type 2 Diabetic classifications (P = 0.03, P = 0.01, & P = 0.01 respectively). Vocal F0 increased with blood glucose levels, but future predictive models of glucose levels based on voice may need to be personalized due to high intraclass correlation.

Assuntos

Glicemia , Diabetes Mellitus Tipo 2 , Voz , Humanos , Diabetes Mellitus Tipo 2/sangue , Glicemia/análise , Masculino , Feminino , Pessoa de Meia-Idade , Voz/fisiologia , Adulto , Idoso , Automonitorização da Glicemia/métodos

13.

Controllable synthesis of TiO2/graphene composites for human voice recognition in strain sensor.

Cheng, Yan; Wang, Ke; Zhang, Siyi.

PLoS One ; 19(8): e0306866, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39146267

RESUMO

Low-dimensional materials have demonstrated strong potential for use in diverse flexible strain sensors for wearable electronic device applications. However, the limited contact area in the sensing layer, caused by the low specific surface area of typical nanomaterials, hinders the pursuit of high-performance strain-sensor applications. Herein, we report an efficient method for synthesizing TiO2-based nanocomposite materials by directly using industrial raw materials with ultrahigh specific surface areas that can be used for strain sensors. A kinetic study of the self-seeded thermal hydrolysis sulfate process was conducted for the controllable synthesis of pure TiO2 and related TiO2/graphene composites. The hydrolysis readily modified the crystal form and morphology of the prepared TiO2 nanoparticles, and the prepared composite samples possessed a uniform nanoporous structure. Experiments demonstrated that the TiO2/graphene composite can be used in strain sensors with a maximum Gauge factor of 252. In addition, the TiO2/graphene composite-based strain sensor showed high stability by continuously operating over 1,000 loading cycles and aging tests over three months. It also shows that the fabricated strain sensors have the potential for human voice recognition by characterizing letters, words, and musical tones.

Assuntos

Grafite , Nanocompostos , Titânio , Titânio/química , Grafite/química , Humanos , Nanocompostos/química , Voz , Dispositivos Eletrônicos Vestíveis

14.

A novel voice classification based on Gower distance for Parkinson disease detection.

Noaman Kadhim, Mustafa; Al-Shammary, Dhiah; Sufi, Fahim.

Int J Med Inform ; 191: 105583, 2024 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-39096595

RESUMO

BACKGROUND: Traditional classifier for the classification of diseases, such as K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM), often struggle with high-dimensional medical datasets. OBJECTIVE: This study presents a novel classifier to overcome the limitations of traditional classifiers in Parkinson's disease (PD) detection based on Gower distance. METHODS: We present the Gower distance metric to handle diverse feature sets in voice recordings, which acts as a dissimilarity measure for all feature types, making the model adept at identifying subtle patterns indicative of PD. Additionally, the Cuckoo Search algorithm is employed for feature selection, reducing dimensionality by focusing on key features, thereby lessening the computational load associated with high-dimensional datasets. RESULTS: The proposed classifier based on Gower distance resulted in an accuracy rate of 98.3% with feature selection and achieved an accuracy of 94.92% without the feature selection method. It outperforms traditional classifiers and recent studies in PD detection from voice recordings. CONCLUSIONS: This accuracy shows the capability of the approach in the correct classification of instances and points out the potential of the approach as a reliable diagnostic tool for the medical practitioner. The findings state that the proposed approach holds promise for improving the diagnosis and monitoring of PD, both within medical institutions and at homes for the elderly.

Assuntos

Algoritmos , Doença de Parkinson , Voz , Doença de Parkinson/diagnóstico , Doença de Parkinson/classificação , Humanos , Masculino , Feminino , Idoso , Máquina de Vetores de Suporte , Pessoa de Meia-Idade

15.

A blended framework for audio spoof detection with sequential models and bags of auditory bites.

Sharafudeen, Misaj; S S, Vinod Chandra; J, Andrew; Sei, Yuichi.

Sci Rep ; 14(1): 20192, 2024 08 30.

Artigo em Inglês | MEDLINE | ID: mdl-39215070

RESUMO

An automated speaker verification system uses the process of speech recognition to verify the identity of a user and block illicit access. Logical access attacks are efforts to obtain access to a system by tampering with its algorithms or data, or by circumventing security mechanisms. DeepFake attacks are a form of logical access threats that employs artificial intelligence to produce highly realistic audio clips of human voice, that may be used to circumvent vocal authentication systems. This paper presents a framework for the detection of Logical Access and DeepFake audio spoofings by integrating audio file components and time-frequency representation spectrograms into a lower-dimensional space using sequential prediction models. Bidirectional-LSTM trained on the bonafide class generates significant one-dimensional features for both classes. The feature set is then standardized to a fixed set using a novel Bags of Auditory Bites (BoAB) feature standardizing algorithm. The Extreme Learning Machine maps the feature space to predictions that differentiate between genuine and spoofed speeches. The framework is evaluated using the ASVspoof 2021 dataset, a comprehensive collection of audio recordings designed for evaluating the strength of speaker verification systems against spoofing attacks. It achieves favorable results on synthesized DeepFake attacks with an Equal Error Rate (EER) of 1.18% in the most optimal setting. Logical Access attacks were more challenging to detect at an EER of 12.22%. Compared to the state-of-the-arts in the ASVspoof2021 dataset, the proposed method notably improves EER for DeepFake attacks by an improvement rate of 95.16%.

Assuntos

Algoritmos , Humanos , Interface para o Reconhecimento da Fala , Segurança Computacional , Voz , Fala , Inteligência Artificial

16.

The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.

Zaltz, Yael.

Trends Hear ; 28: 23312165241275895, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39212078

RESUMO

Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.

Assuntos

Estimulação Acústica , Sinais (Psicologia) , Generalização Psicológica , Ruído , Percepção da Fala , Humanos , Masculino , Feminino , Adulto Jovem , Percepção da Fala/fisiologia , Ruído/efeitos adversos , Adulto , Reconhecimento Psicológico , Mascaramento Perceptivo , Adolescente , Acústica da Fala , Qualidade da Voz , Aprendizagem por Discriminação/fisiologia , Voz/fisiologia

17.

The role of the age and gender, and the complexity of the syntactic unit in the perception of affective emotions in voice.

Trinite, Baiba; Zdanovica, Anita; Kurme, Daiga; Lavrane, Evija; Magazeina, Ilva; Jansone, Anita.

Codas ; 36(5): e20240009, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39046026

RESUMO

PURPOSE: The study aimed to identify (1) whether the age and gender of listeners and the length of vocal stimuli affect emotion discrimination accuracy in voice; and (2) whether the determined level of expression of perceived affective emotions is age and gender-dependent. METHODS: Thirty-two age-matched listeners listened to 270 semantically neutral voice samples produced in neutral, happy, and angry intonation by ten professional actors. The participants were required to categorize the auditory stimulus based on three options and judge the intensity of emotional expression in the sample using a customized tablet web interface. RESULTS: The discrimination accuracy of happy and angry emotions decreased with age, while accuracy in discriminating neutral emotions increased with age. Females rated the intensity level of perceived affective emotions higher than males across all linguistic units. These were: for angry emotions in words (z = -3.599, p < .001), phrases (z = -3.218, p = .001), and texts (z = -2.272, p = .023), for happy emotions in words (z = -5.799, p < .001), phrases (z = -4.706, p < .001), and texts (z = -2.699, p = .007). CONCLUSION: Accuracy in perceiving vocal expressions of emotions varies according to age and gender. Young adults are better at distinguishing happy and angry emotions than middle-aged adults, while middle-aged adults tend to categorize perceived affective emotions as neutral. Gender also plays a role, with females rating expressions of affective emotions in voices higher than males. Additionally, the length of voice stimuli impacts emotion discrimination accuracy.

Assuntos

Emoções , Percepção da Fala , Voz , Humanos , Feminino , Masculino , Adulto , Emoções/fisiologia , Fatores Etários , Adulto Jovem , Fatores Sexuais , Pessoa de Meia-Idade , Percepção da Fala/fisiologia , Voz/fisiologia , Adolescente , Idoso

18.

Voice actors show enhanced neural tracking of pitch, prosody perception, and music perception.

Kachlicka, Magdalena; Tierney, Adam.

Cortex ; 178: 213-222, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39024939

RESUMO

Experiences with sound that make strong demands on the precision of perception, such as musical training and experience speaking a tone language, can enhance auditory neural encoding. Are high demands on the precision of perception necessary for training to drive auditory neural plasticity? Voice actors are an ideal subject population for answering this question. Voice acting requires exaggerating prosodic cues to convey emotion, character, and linguistic structure, drawing upon attention to sound, memory for sound features, and accurate sound production, but not fine perceptual precision. Here we assessed neural encoding of pitch using the frequency-following response (FFR), as well as prosody, music, and sound perception, in voice actors and a matched group of non-actors. We find that the consistency of neural sound encoding, prosody perception, and musical phrase perception are all enhanced in voice actors, suggesting that a range of neural and behavioural auditory processing enhancements can result from training which lacks fine perceptual precision. However, fine discrimination was not enhanced in voice actors but was linked to degree of musical experience, suggesting that low-level auditory processing can only be enhanced by demanding perceptual training. These findings suggest that training which taxes attention, memory, and production but is not perceptually taxing may be a way to boost neural encoding of sound and auditory pattern detection in individuals with poor auditory skills.

Assuntos

Estimulação Acústica , Percepção Auditiva , Música , Percepção da Altura Sonora , Percepção da Fala , Voz , Humanos , Música/psicologia , Masculino , Feminino , Percepção da Altura Sonora/fisiologia , Adulto , Percepção Auditiva/fisiologia , Voz/fisiologia , Adulto Jovem , Percepção da Fala/fisiologia , Atenção/fisiologia

19.

Idiosyncratic and shared contributions shape impressions from voices and faces.

Lavan, Nadine; Sutherland, Clare A M.

Cognition ; 251: 105881, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39029363

RESUMO

Voices elicit rich first impressions of what the person we are hearing might be like. Research stresses that these impressions from voices are shared across different listeners, such that people on average agree which voices sound trustworthy or old and which do not. However, can impressions from voices also be shaped by the 'ear of the beholder'? We investigated whether - and how - listeners' idiosyncratic, personal preferences contribute to first impressions from voices. In two studies (993 participants, 156 voices), we find evidence for substantial idiosyncratic contributions to voice impressions using a variance portioning approach. Overall, idiosyncratic contributions were as important as shared contributions to impressions from voices for inferred person characteristics (e.g., trustworthiness, friendliness). Shared contributions were only more influential for impressions of more directly apparent person characteristics (e.g., gender, age). Both idiosyncratic and shared contributions were reduced when stimuli were limited in their (perceived) variability, suggesting that natural variation in voices is key to understanding this impression formation. When comparing voice impressions to face impressions, we found that idiosyncratic and shared contributions to impressions similarly across modality when stimulus properties are closely matched - although voice impressions were overall less consistent than face impressions. We thus reconceptualise impressions from voices as being formed not only based on shared but also idiosyncratic contributions. We use this new framing to suggest future directions of research, including understanding idiosyncratic mechanisms, development, and malleability of voice impression formation.

Assuntos

Reconhecimento Facial , Percepção Social , Voz , Humanos , Feminino , Masculino , Adulto , Adulto Jovem , Reconhecimento Facial/fisiologia , Percepção Auditiva/fisiologia , Adolescente , Pessoa de Meia-Idade

20.

Mutual eye gaze and vocal pitch in relation to social anxiety and depression: A virtual interaction task.

Howell, Ashley N; Woods, Savannah J; Farmer, William; Zibulsky, Devin A; Srivastav, Akanksha; Randolph, Griffin; Weeks, Justin W.

J Affect Disord ; 363: 282-291, 2024 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-39038622

RESUMO

BACKGROUND: Individuals with high social interaction anxiety (SIA) and depression often behave submissively in social settings. Few studies have simultaneously examined the associations between objectively assessed submissive behaviors and SIA or depression, despite their high comorbidity and unknown mechanisms regarding submissiveness. METHODS: A sample of 45 young adults self-reported trait SIA and depression, state positive/negative affect (PA/NA) before and after a virtual social interaction. Participants engaged in a four-minute conversation with a confederate who was trained to behave neutrally. Mutual eye gaze, via eye-tracking, and vocal pitch were assessed throughout the interaction. RESULTS: Depression and SIA were positively correlated with NA, poorer self-rated performance, and vocal pitch. Highly socially anxious women engaged in less mutual eye gaze than highly socially anxious men. Also, vocal pitch was inversely associated with mutual eye gaze and positively related to NA and (nonsignificantly) to self-ratings of poor performance. Finally, our data partially replicated past research on the use of vocal pitch during social stress to detect social anxiety disorder. LIMITATIONS: The current sample is relatively homogenous in educational attainment, age, and race. All research confederates were women. Future research should examine whether these archival data replicate with the latest telecommunication technologies. CONCLUSION: Our findings highlight nuanced relationships among SIA, depression, emotions, self-perceptions, and biobehavioral indicators of submissive behavior-in response to an ambiguously negative/positive social interaction. Sex/gender may interact with these effects, emphasizing considerations for research method designs.

Assuntos

Depressão , Fixação Ocular , Interação Social , Humanos , Feminino , Masculino , Adulto Jovem , Fixação Ocular/fisiologia , Adulto , Depressão/psicologia , Ansiedade/psicologia , Fobia Social/psicologia , Fobia Social/fisiopatologia , Voz/fisiologia , Adolescente

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA