Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 807
Filter
1.
Sci Rep ; 14(1): 15194, 2024 07 02.
Article in English | MEDLINE | ID: mdl-38956187

ABSTRACT

After a right hemisphere stroke, more than half of the patients are impaired in their capacity to produce or comprehend speech prosody. Yet, and despite its social-cognitive consequences for patients, aprosodia following stroke has received scant attention. In this report, we introduce a novel, simple psychophysical procedure which, by combining systematic digital manipulations of speech stimuli and reverse-correlation analysis, allows estimating the internal sensory representations that subtend how individual patients perceive speech prosody, and the level of internal noise that govern behavioral variability in how patients apply these representations. Tested on a sample of N = 22 right-hemisphere stroke survivors and N = 21 age-matched controls, the representation + noise model provides a promising alternative to the clinical gold standard for evaluating aprosodia (MEC): both parameters strongly associate with receptive, and not expressive, aprosodia measured by MEC within the patient group; they have better sensitivity than MEC for separating high-functioning patients from controls; and have good specificity with respect to non-prosody-related impairments of auditory attention and processing. Taken together, individual differences in either internal representation, internal noise, or both, paint a potent portrait of the variety of sensory/cognitive mechanisms that can explain impairments of prosody processing after stroke.


Subject(s)
Speech Perception , Stroke , Humans , Stroke/physiopathology , Stroke/complications , Speech Perception/physiology , Male , Female , Middle Aged , Aged , Noise , Psychophysics/methods , Adult
2.
Cogn Emot ; : 1-11, 2024 Jul 07.
Article in English | MEDLINE | ID: mdl-38973172

ABSTRACT

While previous research has found an in-group advantage (IGA) favouring native speakers in emotional prosody perception over non-native speakers, the effects of semantics on emotional prosody perception remain unclear. This study investigated the effects of semantics on emotional prosody perception in Chinese words and sentences for native and non-native Chinese speakers. The critical manipulation was the congruence of prosodic (positive, negative) and semantic (positive, negative, and neutral) valence. Participants listened to a series of audio clips and judged whether the emotional prosody was positive or negative for each utterance. The results revealed an IGA effect: native speakers perceived emotional prosody more accurately and quickly than non-native speakers in Chinese words and sentences. Furthermore, a semantic congruence effect was observed in Chinese words, where both native and non-native speakers recognised emotional prosody more accurately in the semantic-prosody congruent condition than in the incongruent condition. However, in Chinese sentences, this congruence effect was only present for non-native speakers. Additionally, the IGA effect and semantic congruence effect on emotional prosody perception were influenced by prosody valence. These findings illuminate the role of semantics in emotional prosody perception, highlighting perceptual differences between native and non-native Chinese speakers.

3.
Neuropsychol Rehabil ; : 1-41, 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38848458

ABSTRACT

It is unclear whether individuals with agrammatic aphasia have particularly disrupted prosody, or in fact have relatively preserved prosody they can use in a compensatory way. A targeted literature review was undertaken to examine the evidence regarding the capacity of speakers with agrammatic aphasia to produce prosody. The aim was to answer the question, how much prosody can a speaker "do" with limited syntax? The literature was systematically searched for articles examining the production of grammatical prosody in people with agrammatism, and yielded 16 studies that were ultimately included in this review. Participant inclusion criteria, spoken language tasks, and analysis procedures vary widely across studies. The evidence indicates that timing aspects of prosody are disrupted in people with agrammatic aphasia, while the use of pitch and amplitude cues is more likely to be preserved in this population. Some, but not all, of these timing differences may be attributable to motor speech programming deficits (AOS) rather than aphasia, as these conditions frequently co-occur. Many of the included studies do not address AOS and its possible role in any observed effects. Finally, the available evidence indicates that even speakers with severe aphasia show a degree of preserved prosody in functional communication.

4.
Lang Speech ; : 238309241258162, 2024 Jun 14.
Article in English | MEDLINE | ID: mdl-38877720

ABSTRACT

Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.

5.
J Psycholinguist Res ; 53(4): 56, 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38926243

ABSTRACT

The present paper examines how English native speakers produce scopally ambiguous sentences and how they make use of gestures and prosody for disambiguation. As a case in point, the participants in the present study produced the English negative quantifiers. They appear in two different positions as (1) The election of no candidate was a surprise (a: 'for those elected, none of them was a surprise'; b: 'no candidate was elected, and that was a surprise') and (2) no candidate's election was a surprise (a: 'for those elected, none of them was a surprise'; b: # 'no candidate was elected, and that was a surprise.' We were able to investigate the gesture production and the prosodic patterns of the positional effects (i.e., a-interpretation is available at two different positions in 1 and 2) and the interpretation effects (i.e., two different interpretations are available in the same position in 1). We discovered that the participants tended to launch more head shakes in the (a) interpretation despites the different positions, but more head nod/beat in the (b) interpretation. While there is not a difference in prosody of no in (a) and (b) interpretation in (1), there are pitch and durational differences between (a) interpretations in (1) and (2). This study points out the abstract similarities across languages such as Catalan and Spanish (Prieto et al. in Lingua 131:136-150, 2013. 10.1016/j.lingua.2013.02.008; Tubau et al. in Linguist Rev 32(1):115-142, 2015. 10.1515/tlr-2014-0016) in the gestural movements, and the meaning is crucial for gesture patterns. We emphasize that gesture patterns disambiguate ambiguous interpretation when prosody cannot do so.


Subject(s)
Gestures , Psycholinguistics , Humans , Adult , Male , Female , Speech/physiology , Language , Young Adult
6.
JMIR Res Protoc ; 13: e54030, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38935945

ABSTRACT

BACKGROUND: Sound therapy methods have seen a surge in popularity, with a predominant focus on music among all types of sound stimulation. There is substantial evidence documenting the integrative impact of music therapy on psycho-emotional and physiological outcomes, rendering it beneficial for addressing stress-related conditions such as pain syndromes, depression, and anxiety. Despite these advancements, the therapeutic aspects of sound, as well as the mechanisms underlying its efficacy, remain incompletely understood. Existing research on music as a holistic cultural phenomenon often overlooks crucial aspects of sound therapy mechanisms, particularly those related to speech acoustics or the so-called "music of speech." OBJECTIVE: This study aims to provide an overview of empirical research on sound interventions to elucidate the mechanism underlying their positive effects. Specifically, we will focus on identifying therapeutic factors and mechanisms of change associated with sound interventions. Our analysis will compare the most prevalent types of sound interventions reported in clinical studies and experiments. Moreover, we will explore the therapeutic effects of sound beyond music, encompassing natural human speech and intermediate forms such as traditional poetry performances. METHODS: This review adheres to the methodological guidance of the Joanna Briggs Institute and follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist for reporting review studies, which is adapted from the Arksey and O'Malley framework. Our search strategy encompasses PubMed, Web of Science, Scopus, and PsycINFO or EBSCOhost, covering literature from 1990 to the present. Among the different study types, randomized controlled trials, clinical trials, laboratory experiments, and field experiments were included. RESULTS: Data collection began in October 2022. We found a total of 2027 items. Our initial search uncovered an asymmetry in the distribution of studies, with a larger number focused on music therapy compared with those exploring prosody in spoken interventions such as guided meditation or hypnosis. We extracted and selected papers using Rayyan software (Rayyan) and identified 41 eligible papers after title and abstract screening. The completion of the scoping review is anticipated by October 2024, with key steps comprising the analysis of findings by May 2024, drafting and revising the study by July 2024, and submitting the paper for publication in October 2024. CONCLUSIONS: In the next step, we will conduct a quality evaluation of the papers and then chart and group the therapeutic factors extracted from them. This process aims to unveil conceptual gaps in existing studies. Gray literature sources, such as Google Scholar, ClinicalTrials.gov, nonindexed conferences, and reference list searches of retrieved studies, will be added to our search strategy to increase the number of relevant papers that we cover. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/54030.


Subject(s)
Music Therapy , Stress, Psychological , Humans , Stress, Psychological/therapy , Music Therapy/methods , Adult
7.
Brain Connect ; 14(5): 294-303, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38756082

ABSTRACT

Purpose: Rhyming is a phonological skill that typically emerges in the preschool-age range. Prosody/rhythm processing involves right-lateralized temporal cortex, yet the neural basis of rhyming ability in young children is unclear. The study objective was to use functional magnetic resonance imaging (fMRI) to quantify neural correlates of rhyming abilities in preschool-age children. Method: Healthy pre-kindergarten child-parent dyads were recruited for a study visit including MRI and the Preschool and Primary Inventory of Phonological Awareness (PIPA) rhyme subtest. MRI included an fMRI task where the child listened to a rhymed and unrhymed story without visual stimuli. fMRI data were processed using the CONN functional connectivity (FC) toolbox, with FC computed between 132 regions of interest (ROI) across the brain. Associations between PIPA score and FC during the rhymed versus unrhymed story were compared accounting for age, sex, and maternal education. Results: In total, 45 children completed MRI (age 54 ± 8 months, 37-63; 19M 26F). Median maternal education was college graduate. FC between ROIs in posterior default mode (imagery) and right fronto-parietal (executive function) networks was more strongly positively associated with PIPA score during the rhymed compared with the unrhymed story [F(2,39) = 10.95, p-FDR = 0.043], as was FC between ROIs in right-sided language (prosody) and dorsal attention networks [F(2,39) = 9.85, p-FDR = 0.044]. Conclusions: Preschool-age children with better rhyming abilities had stronger FC between ROIs supporting attention and prosody and also between ROIs supporting executive function and imagery, suggesting rhyme as a catalyst for attention, visualization, and comprehension. These represent novel neural biomarkers of nascent phonological skills.


Subject(s)
Brain , Magnetic Resonance Imaging , Humans , Female , Male , Magnetic Resonance Imaging/methods , Child, Preschool , Brain/physiology , Brain/diagnostic imaging , Speech Perception/physiology , Brain Mapping/methods , Phonetics
8.
J Commun Disord ; 110: 106431, 2024.
Article in English | MEDLINE | ID: mdl-38781923

ABSTRACT

INTRODUCTION: Prosody is used to express indexical (identifying the talker), linguistic (e.g., question intonation, lexical stress), pragmatic (e.g., contrastive stress, sarcasm), and emotional/affective functions. It is manifested through changes in fundamental frequency (f0), intensity, and duration. F0 and intensity are degraded when perceived through a cochlear implant (CI). The purpose of this meta-analysis is to compare expressive prosody in speech produced by CI users versus normal hearing peers. METHODS: A systematic search of the literature found 25 articles that met all inclusion criteria. These articles were assessed for quality, and data pertaining to the expression of f0, intensity, and duration, as well as classification accuracy and appropriateness ratings from normal hearing listeners, were extracted and meta-analyzed using random effects models. RESULTS: The articles included in the meta-analysis were generally of acceptable or high quality. Meta-analyses revealed significant differences between individuals with CIs vs. normal hearing on all measures except mean f0, mean intensity, and rhythm. Effect sizes were generally medium to large. There was significant heterogeneity across studies, but little evidence of publication bias. CONCLUSIONS: CI users speak with less variable f0, smaller f0 contours, more variable intensity, a slower speech rate, and reduced final lengthening at syntactic boundaries. These acoustic differences are reflected in significantly poorer ratings of speech produced by CI users compared to their normal hearing peers, as assessed by groups of normal hearing listeners. Because atypical expressive prosody is associated with negative outcomes, clinicians should consider targeting prosody when working with individuals who use CIs.


Subject(s)
Cochlear Implants , Humans , Speech Perception , Speech
9.
J Commun Disord ; 110: 106430, 2024.
Article in English | MEDLINE | ID: mdl-38754316

ABSTRACT

INTRODUCTION: Parkinson's disease (PD) is a progressive neurodegenerative disorder that affects approximately 1%-2% of individuals aged 60 and above. Communication disorders in PD can significantly impact the overall quality of life. As prosody plays a vital role in verbal communication, the present study examines Persian prosody perception in PD, focusing on linguistic and emotional aspects of prosody. METHODS: This cross-sectional study aimed to compare the perception of linguistic and emotional prosody in three groups: middle-aged adults (n = 22; mean age = 50.40 years), healthy older adults (n = 22; mean age = 68.31 years), and individuals with Parkinson's disease (n = 22; mean age = 65years). All individuals with PD were classified in stages 1; 1.5; 2; 2.5, and 3 of the disease using the Hoehn and Yahr scale. All participants had an MMSE score of 24 or above. The Florida Affect Battery (FAB) was used to evaluate prosody perception. This Battery was validated in the Persian language and its reliability and validity were reported as 94 % and 100 % respectively. RESULTS: Participants with PD presented significantly lower scores than the older adults in all subtests of the FAB (p < 0.05), while healthy older adults were significantly different only in linguistic discrimination (ß = -2.14; -3.68 to -0.61), and linguistic naming of prosody (ß = 1.25; 0.17 to 2.33) compared to middle-aged adults. CONCLUSIONS: The present study sheds light on the influence of PD on Persian prosody perception. Given the crucial role of prosody in verbal communication, these findings enhance our understanding of communication disorders in PD and could bring attention to consider prosody perception, among other aspects, when assessing individuals affected by PD.


Subject(s)
Parkinson Disease , Speech Perception , Humans , Parkinson Disease/psychology , Parkinson Disease/complications , Male , Middle Aged , Female , Cross-Sectional Studies , Aged , Emotions , Quality of Life/psychology , Iran
10.
Alzheimers Dement (Amst) ; 16(2): e12594, 2024.
Article in English | MEDLINE | ID: mdl-38721025

ABSTRACT

Dementia with Lewy bodies (DLB) and Alzheimer's disease (AD), the two most common neurodegenerative dementias, both exhibit altered emotional processing. However, how vocal emotional expressions alter in and differ between DLB and AD remains uninvestigated. We collected voice data during story reading from 152 older adults comprising DLB, AD, and cognitively unimpaired (CU) groups and compared their emotional prosody in terms of valence and arousal dimensions. Compared with matched AD and CU participants, DLB patients showed reduced overall emotional expressiveness, as well as lower valence (more negative) and lower arousal (calmer), the extent of which was associated with cognitive impairment and insular atrophy. Classification models using vocal features discriminated DLB from AD and CU with an AUC of 0.83 and 0.78, respectively. Our findings may aid in discriminating DLB patients from AD and CU individuals, serving as a surrogate marker for clinical and neuropathological changes in DLB. Highlights: DLB showed distinctive reduction in vocal expression of emotions.Cognitive impairment was associated with reduced vocal emotional expression in DLB.Insular atrophy was associated with reduced vocal emotional expression in DLB.Emotional expression measures successfully differentiated DLB from AD or controls.

11.
J Autism Dev Disord ; 2024 May 04.
Article in English | MEDLINE | ID: mdl-38703251

ABSTRACT

PURPOSE: Autistic individuals often face challenges perceiving and expressing emotions, potentially stemming from differences in speech prosody. Here we explore how autism diagnoses between groups, and measures of social competence within groups may be related to, first, children's speech characteristics (both prosodic features and amount of spontaneous speech), and second, to these two factors in mothers' speech to their children. METHODS: Autistic (n = 21) and non-autistic (n = 18) children, aged 7-12 years, participated in a Lego-building task with their mothers, while conversational speech was recorded. Mean F0, pitch range, pitch variability, and amount of spontaneous speech were calculated for each child and their mother. RESULTS: The results indicated no differences in speech characteristics across autistic and non-autistic children, or across their mothers, suggesting that conversational context may have large effects on whether differences between autistic and non-autistic populations are found. However, variability in social competence within the group of non-autistic children (but not within autistic children) was predictive of children's mean F0, pitch range and pitch variability. The amount of spontaneous speech produced by mothers (but not their prosody) predicted their autistic children's social competence, which may suggest a heightened impact of scaffolding for mothers of autistic children. CONCLUSION: Together, results suggest complex interactions between context, social competence, and adaptive parenting strategies in driving prosodic differences in children's speech.

12.
Risk Anal ; 2024 May 14.
Article in English | MEDLINE | ID: mdl-38742599

ABSTRACT

People typically use verbal probability phrases when discussing risks ("It is likely that this treatment will work"), both in written and spoken communication. When speakers are uncertain about risks, they can nonverbally signal this uncertainty by using prosodic cues, such as a rising, question-like intonation or a filled pause ("uh"). We experimentally studied the effects of these two prosodic cues on the listener's perceived speaker certainty and numerical interpretation of spoken verbal probability phrases. Participants (N = 115) listened to various verbal probability phrases that were uttered with a rising or falling global intonation and with or without a filled pause before the probability phrase. For each phrase, they gave a point estimate of their numerical interpretation in percentages and indicated how certain they thought the speaker was about the correctness of the probability phrase. Speakers were perceived as least certain when the verbal probability phrases were spoken with both prosodic uncertainty cues. Interpretation of verbal probability phrases varied widely across participants, especially when rising intonation was produced by the speaker. Overall, high probability phrases (e.g., "very likely") were estimated as lower (and low probability phrases, such as "unlikely," as higher) when they were uttered with a rising intonation. The effects of filled pauses were less pronounced, as were the uncertainty effects for medium probability phrases (e.g., "probable"). These results stress the importance of nonverbal communication when verbally communicating risks and probabilities to people, for example, in the context of doctor-patient communication.

13.
Front Psychol ; 15: 1296933, 2024.
Article in English | MEDLINE | ID: mdl-38655212

ABSTRACT

In this paper, we investigate how information status is encoded paradigmatically and syntagmatically via prosodic prominence in German. In addition, we consider individual variability in the production of prominence. To answer our research questions, we collected controlled yet ecologically valid speech by applying an innovative recording paradigm. Participants were asked to perform an interactive reading task in collaboration with an interlocutor remotely via video calls. Results indicate that information status is encoded paradigmatically via the F0 contour, while syntagmatic effects are subtle and depend on the acoustic parameter used. Individual speakers differ primarily in their strength of encoding and secondarily in the type of parameters employed. While the paradigmatic effects we observe are in line with previous findings, our syntagmatic findings support two contradictory ideas, a balancing effect and a radiating effect. Along with the findings at the individual level, this study thus allows for new insights regarding the redundant and relational nature of prosodic prominence.

14.
Front Psychol ; 15: 1322482, 2024.
Article in English | MEDLINE | ID: mdl-38633875

ABSTRACT

Echo questions serve two pragmatic functions (recapitulatory and explicatory) and are subdivided into two types (yes-no echo question and wh-echo question) in verbal communication. Yet to date, most relevant studies have been conducted in European languages like English and Spanish. It remains unknown whether the different functions of echo questions can be conveyed via prosody in spoken Chinese. Additionally, no comparison was made on the diversified algorithmic models in predicting functions by the prosodity of Chinese echo questions, a novel linguistic cognition in nature. This motivated us to use different acoustic cues to predict different pragmatic functions of Chinese echo questions by virtue of acoustic experiment and data modeling. The results showed that for yes-no echo question, explicatory function exhibited higher pitch and intensity patterns than recapitulatory function whereas for wh-echo question, recapitulatory function demonstrated higher pitch and intensity patterns than explicatory function. With regard to data modeling, the algorithm Support Vector Machine (SVM) relative to Random Forest (RF) and Logistic Regression (LR) performed better when predicting different functions using prosodic cues in both yes-no and wh-echo questions. This study from a digitized perspective adds evidence to the cognition of echo questions' functions on a prosodic basis.

15.
Cogn Sci ; 48(4): e13436, 2024 04.
Article in English | MEDLINE | ID: mdl-38564245

ABSTRACT

We report the results of one visual-world eye-tracking experiment and two referent selection tasks in which we investigated the effects of information structure in the form of prosody and word order manipulation on the processing of subject pronouns er and der in German. Factors such as subjecthood, focus, and topicality, as well as order of mention have been linked to an increased probability of certain referents being selected as the pronoun's antecedent and described as increasing this referent's prominence, salience, or accessibility. The goal of this study was to find out whether pronoun processing is primarily guided by linguistic factors (e.g., grammatical role) or nonlinguistic factors (e.g., first-mention), and whether pronoun interpretation can be described in terms of referents' "prominence" / "accessibility" / "salience." The results showed an overall subject preference for er, whereas der was affected by the object role and focus marking. While focus increases the attentional load and enhances memory representation for the focused referent making the focused referent more available, ultimately it did not affect the final interpretation of er, suggesting that "prominence" or the related concepts do not explain referent selection preferences. Overall, the results suggest a primacy of linguistic factors in determining pronoun resolution.


Subject(s)
Emotions , Linguistics , Male , Humans , Eye-Tracking Technology , Probability
16.
Autism Res ; 17(4): 824-837, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38488319

ABSTRACT

Cumulating evidence suggests that atypical emotion processing in autism may generalize across different stimulus domains. However, this evidence comes from studies examining explicit emotion recognition. It remains unclear whether domain-general atypicality also applies to implicit emotion processing in autism and its implication for real-world social communication. To investigate this, we employed a novel cross-modal emotional priming task to assess implicit emotion processing of spoken/sung words (primes) through their influence on subsequent emotional judgment of faces/face-like objects (targets). We assessed whether implicit emotional priming differed between 38 autistic and 38 neurotypical individuals across age groups as a function of prime and target type. Results indicated no overall group differences across age groups, prime types, and target types. However, differential, domain-specific developmental patterns emerged for the autism and neurotypical groups. For neurotypical individuals, speech but not song primed the emotional judgment of faces across ages. This speech-orienting tendency was not observed across ages in the autism group, as priming of speech on faces was not seen in autistic adults. These results outline the importance of the delicate weighting between speech- versus song-orientation in implicit emotion processing throughout development, providing more nuanced insights into the emotion processing profile of autistic individuals.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Adult , Humans , Facial Expression , Emotions , Autistic Disorder/psychology , Judgment
17.
Phonetica ; 81(3): 321-349, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38522003

ABSTRACT

This study investigates the variation in phrase-final f0 movements found in dyadic unscripted conversations in Papuan Malay, an Eastern Indonesian language. This is done by a novel combination of exploratory and confirmatory classification techniques. In particular, this study investigates the linguistic factors that potentially drive f0 contour variation in phrase-final words produced in a naturalistic interactive dialogue task. To this end, a cluster analysis, manual labelling and random forest analysis are carried out to reveal the main sources of contour variation. These are: taking conversational interaction into account; turn transition, topic continuation, information structure (givenness and contrast), and context-independent properties of words such as word class, syllable structure, voicing and intrinsic f0. Results indicate that contour variation in Papuan Malay, in particular f0 direction and target level, is best explained by turn transitions between speakers, corroborating similar findings for related languages. The applied methods provide opportunities to further lower the threshold of incorporating intonation and prosody in the early stages of language documentation.


Subject(s)
Language , Phonetics , Humans , Female , Male , Indonesia , Speech Acoustics , Adult , Linguistics , Speech Production Measurement
18.
BMC Med ; 22(1): 121, 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38486293

ABSTRACT

BACKGROUND: Socio-emotional impairments are among the diagnostic criteria for autism spectrum disorder (ASD), but the actual knowledge has substantiated both altered and intact emotional prosodies recognition. Here, a Bayesian framework of perception is considered suggesting that the oversampling of sensory evidence would impair perception within highly variable environments. However, reliable hierarchical structures for spectral and temporal cues would foster emotion discrimination by autistics. METHODS: Event-related spectral perturbations (ERSP) extracted from electroencephalographic (EEG) data indexed the perception of anger, disgust, fear, happiness, neutral, and sadness prosodies while listening to speech uttered by (a) human or (b) synthesized voices characterized by reduced volatility and variability of acoustic environments. The assessment of mechanisms for perception was extended to the visual domain by analyzing the behavioral accuracy within a non-social task in which dynamics of precision weighting between bottom-up evidence and top-down inferences were emphasized. Eighty children (mean 9.7 years old; standard deviation 1.8) volunteered including 40 autistics. The symptomatology was assessed at the time of the study via the Autism Diagnostic Observation Schedule, Second Edition, and parents' responses on the Autism Spectrum Rating Scales. A mixed within-between analysis of variance was conducted to assess the effects of group (autism versus typical development), voice, emotions, and interaction between factors. A Bayesian analysis was implemented to quantify the evidence in favor of the null hypothesis in case of non-significance. Post hoc comparisons were corrected for multiple testing. RESULTS: Autistic children presented impaired emotion differentiation while listening to speech uttered by human voices, which was improved when the acoustic volatility and variability of voices were reduced. Divergent neural patterns were observed from neurotypicals to autistics, emphasizing different mechanisms for perception. Accordingly, behavioral measurements on the visual task were consistent with the over-precision ascribed to the environmental variability (sensory processing) that weakened performance. Unlike autistic children, neurotypicals could differentiate emotions induced by all voices. CONCLUSIONS: This study outlines behavioral and neurophysiological mechanisms that underpin responses to sensory variability. Neurobiological insights into the processing of emotional prosodies emphasized the potential of acoustically modified emotional prosodies to improve emotion differentiation by autistics. TRIAL REGISTRATION: BioMed Central ISRCTN Registry, ISRCTN18117434. Registered on September 20, 2020.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Child , Humans , Autistic Disorder/diagnosis , Speech , Autism Spectrum Disorder/diagnosis , Bayes Theorem , Emotions/physiology , Acoustics
19.
J Voice ; 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38548505

ABSTRACT

OBJECTIVES: The purpose of this study is to identify the accuracy with which graduate students in a department of communication sciences and disorders identify modal register, vocal fry, and uptalk presented in audio samples of female celebrity speakers, and to report these listeners' perceptual responses to a variety of attributes (eg, trustworthy, competent, educated). STUDY DESIGN: This investigation was an anonymous online survey study. METHODS: As part of an anonymous online survey, graduate students in a department of communicative sciences and disorders listened to training modules and then classified female voice samples according to the three features under investigation (ie, modal register, vocal fry, and uptalk). The listeners then appraised a variety of speaker attributes, including physical attractiveness, trustworthiness, competence, and level of education, based on the audio samples of connected speech. RESULTS: The participants labeled voice samples of vocal fry with 85% accuracy, uptalk at 79% accuracy, and modal register only 51% of the time. Cohen's Kappa showed substantial agreement between the repeated ratings of modal register and a moderate agreement for repeated ratings of samples of vocal fry and uptalk. A Pearson analysis revealed a strong positive correlation between the modal register samples and positive attributes, including the appeal of the voice, trustworthiness, competence, and level of education, but yielded negative correlations for vocal fry and uptalk with the same traits. The listeners assigned negative attributes (eg, untrustworthy, unappealing voice) to the samples of vocal fry and uptalk. CONCLUSIONS: Training notwithstanding, the listeners had difficulty accurately identifying modal register despite their frequent exposure to that manner of phonation. They did, however, with a high level of accuracy, identify vocal fry and uptalk and were similar in their judgment of attributes to those vocal features. The listeners' responses revealed similar negative assumptions associated with vocal fry and uptalk and positive assessments of modal register. Our results raise questions rather than conclusions. If, as has been shown in previous studies, female speakers often speak in vocal fry and uptalk, why are their responses to those vocal features so negative? Are the cultural, sociolinguistic, and peer pressures so powerful that they override the individual's judgment?

20.
Sensors (Basel) ; 24(5)2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38475158

ABSTRACT

Since the advent of modern computing, researchers have striven to make the human-computer interface (HCI) as seamless as possible. Progress has been made on various fronts, e.g., the desktop metaphor (interface design) and natural language processing (input). One area receiving attention recently is voice activation and its corollary, computer-generated speech. Despite decades of research and development, most computer-generated voices remain easily identifiable as non-human. Prosody in speech has two primary components-intonation and rhythm-both often lacking in computer-generated voices. This research aims to enhance computer-generated text-to-speech algorithms by incorporating melodic and prosodic elements of human speech. This study explores a novel approach to add prosody by using machine learning, specifically an LSTM neural network, to add paralinguistic elements to a recorded or generated voice. The aim is to increase the realism of computer-generated text-to-speech algorithms, to enhance electronic reading applications, and improved artificial voices for those in need of artificial assistance to speak. A computer that is able to also convey meaning with a spoken audible announcement will also improve human-to-computer interactions. Applications for the use of such an algorithm may include improving high-definition audio codecs for telephony, renewing old recordings, and lowering barriers to the utilization of computing. This research deployed a prototype modular platform for digital speech improvement by analyzing and generalizing algorithms into a modular system through laboratory experiments to optimize combinations and performance in edge cases. The results were encouraging, with the LSTM-based encoder able to produce realistic speech. Further work will involve optimizing the algorithm and comparing its performance against other approaches.


Subject(s)
Speech Perception , Speech , Speech/physiology , Speech Perception/physiology , Computers , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...