Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Science ; 382(6669): 417-423, 2023 10 27.
Article in English | MEDLINE | ID: mdl-37883535

ABSTRACT

Faces and voices are the dominant social signals used to recognize individuals among primates. Yet, it is not known how these signals are integrated into a cross-modal representation of individual identity in the primate brain. We discovered that, although single neurons in the marmoset hippocampus exhibited selective responses when presented with the face or voice of a specific individual, a parallel mechanism for representing the cross-modal identities for multiple individuals was evident within single neurons and at the population level. Manifold projections likewise showed the separability of individuals as well as clustering for others' families, which suggests that multiple learned social categories are encoded as related dimensions of identity in the hippocampus. Neural representations of identity in the hippocampus are thus both modality independent and reflect the primate social network.


Subject(s)
Callithrix , Facial Recognition , Hippocampus , Neurons , Social Identification , Voice Recognition , Animals , Hippocampus/cytology , Hippocampus/physiology , Callithrix/physiology , Callithrix/psychology , Facial Recognition/physiology , Voice Recognition/physiology , Neurons/physiology , Social Networking
2.
Neuroimage ; 263: 119647, 2022 11.
Article in English | MEDLINE | ID: mdl-36162634

ABSTRACT

Recognising a speaker's identity by the sound of their voice is important for successful interaction. The skill depends on our ability to discriminate minute variations in the acoustics of the vocal signal. Performance on voice identity assessments varies widely across the population. The neural underpinnings of this ability and its individual differences, however, remain poorly understood. Here we provide critical tests of a theoretical framework for the neural processing stages of voice identity and address how individual differences in identity discrimination mediate activation in this neural network. We scanned 40 individuals on an fMRI adaptation task involving voices drawn from morphed continua between two personally familiar identities. Analyses dissociated neuronal effects induced by repetition of acoustically similar morphs from those induced by a switch in perceived identity. Activation in temporal voice-sensitive areas decreased with acoustic similarity between consecutive stimuli. This repetition suppression effect was mediated by the performance on an independent voice assessment and this result highlights an important functional role of adaptive coding in voice expertise. Bilateral anterior insulae and medial frontal gyri responded to a switch in perceived voice identity compared to an acoustically equidistant switch within identity. Our results support a multistep model of voice identity perception.


Subject(s)
Acoustics , Auditory Diseases, Central , Cognition , Voice Recognition , Humans , Acoustic Stimulation , Cognition/physiology , Magnetic Resonance Imaging , Prefrontal Cortex/physiology , Voice Recognition/physiology , Auditory Diseases, Central/physiopathology , Male , Female , Adolescent , Young Adult , Adult , Nerve Net/physiology
3.
J Neurosci ; 41(33): 7136-7147, 2021 08 18.
Article in English | MEDLINE | ID: mdl-34244362

ABSTRACT

Recognizing speech in background noise is a strenuous daily activity, yet most humans can master it. An explanation of how the human brain deals with such sensory uncertainty during speech recognition is to-date missing. Previous work has shown that recognition of speech without background noise involves modulation of the auditory thalamus (medial geniculate body; MGB): there are higher responses in left MGB for speech recognition tasks that require tracking of fast-varying stimulus properties in contrast to relatively constant stimulus properties (e.g., speaker identity tasks) despite the same stimulus input. Here, we tested the hypotheses that (1) this task-dependent modulation for speech recognition increases in parallel with the sensory uncertainty in the speech signal, i.e., the amount of background noise; and that (2) this increase is present in the ventral MGB, which corresponds to the primary sensory part of the auditory thalamus. In accordance with our hypothesis, we show, by using ultra-high-resolution functional magnetic resonance imaging (fMRI) in male and female human participants, that the task-dependent modulation of the left ventral MGB (vMGB) for speech is particularly strong when recognizing speech in noisy listening conditions in contrast to situations where the speech signal is clear. The results imply that speech in noise recognition is supported by modifications at the level of the subcortical sensory pathway providing driving input to the auditory cortex.SIGNIFICANCE STATEMENT Speech recognition in noisy environments is a challenging everyday task. One reason why humans can master this task is the recruitment of additional cognitive resources as reflected in recruitment of non-language cerebral cortex areas. Here, we show that also modulation in the primary sensory pathway is specifically involved in speech in noise recognition. We found that the left primary sensory thalamus (ventral medial geniculate body; vMGB) is more involved when recognizing speech signals as opposed to a control task (speaker identity recognition) when heard in background noise versus when the noise was absent. This finding implies that the brain optimizes sensory processing in subcortical sensory pathway structures in a task-specific manner to deal with speech recognition in noisy environments.


Subject(s)
Brain Mapping , Geniculate Bodies/physiology , Inferior Colliculi/physiology , Noise , Speech Perception/physiology , Thalamus/physiology , Adult , Female , Humans , Magnetic Resonance Imaging , Male , Models, Neurological , Phonetics , Pilot Projects , Reaction Time , Signal-To-Noise Ratio , Uncertainty , Voice Recognition/physiology
4.
Neuroreport ; 32(10): 858-863, 2021 07 07.
Article in English | MEDLINE | ID: mdl-34029292

ABSTRACT

People require multimodal emotional interactions to live in a social environment. Several studies using dynamic facial expressions and emotional voices have reported that multimodal emotional incongruency evokes an early sensory component of event-related potentials (ERPs), while others have found a late cognitive component. The integration mechanism of two different results remains unclear. We speculate that it is semantic analysis in a multimodal integration framework that evokes the late ERP component. An electrophysiological experiment was conducted using emotionally congruent or incongruent dynamic faces and natural voices to promote semantic analysis. To investigate the top-down modulation of the ERP component, attention was manipulated via two tasks that directed participants to attend to facial versus vocal expressions. Our results revealed interactions between facial and vocal emotional expressions, manifested as modulations of the auditory N400 ERP amplitudes but not N1 and P2 amplitudes, for incongruent emotional face-voice combinations only in the face-attentive task. A late occipital positive potential amplitude emerged only during the voice-attentive task. Overall, these findings support the idea that semantic analysis is a key factor in evoking the late cognitive component. The task effect for these ERPs suggests that top-down attention alters not only the amplitude of ERP but also the ERP component per se. Our results implicate a principle of emotional face-voice processing in the brain that may underlie complex audiovisual interactions in everyday communication.


Subject(s)
Emotions/physiology , Evoked Potentials/physiology , Facial Expression , Facial Recognition/physiology , Occipital Lobe/physiology , Voice Recognition/physiology , Acoustic Stimulation/methods , Adolescent , Adult , Electroencephalography/methods , Female , Humans , Male , Photic Stimulation/methods , Psychomotor Performance/physiology , Random Allocation , Video Recording/methods , Young Adult
5.
PLoS One ; 16(4): e0250214, 2021.
Article in English | MEDLINE | ID: mdl-33861789

ABSTRACT

Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.


Subject(s)
Pattern Recognition, Physiological/physiology , Speech Perception/physiology , Voice Recognition/physiology , Adult , Auditory Perception/physiology , Evoked Potentials , Female , Humans , Male , Quebec , Recognition, Psychology/physiology , Speech/physiology , Voice/physiology
6.
PLoS Biol ; 19(4): e3000751, 2021 04.
Article in English | MEDLINE | ID: mdl-33848299

ABSTRACT

Across many species, scream calls signal the affective significance of events to other agents. Scream calls were often thought to be of generic alarming and fearful nature, to signal potential threats, with instantaneous, involuntary, and accurate recognition by perceivers. However, scream calls are more diverse in their affective signaling nature than being limited to fearfully alarming a threat, and thus the broader sociobiological relevance of various scream types is unclear. Here we used 4 different psychoacoustic, perceptual decision-making, and neuroimaging experiments in humans to demonstrate the existence of at least 6 psychoacoustically distinctive types of scream calls of both alarming and non-alarming nature, rather than there being only screams caused by fear or aggression. Second, based on perceptual and processing sensitivity measures for decision-making during scream recognition, we found that alarm screams (with some exceptions) were overall discriminated the worst, were responded to the slowest, and were associated with a lower perceptual sensitivity for their recognition compared with non-alarm screams. Third, the neural processing of alarm compared with non-alarm screams during an implicit processing task elicited only minimal neural signal and connectivity in perceivers, contrary to the frequent assumption of a threat processing bias of the primate neural system. These findings show that scream calls are more diverse in their signaling and communicative nature in humans than previously assumed, and, in contrast to a commonly observed threat processing bias in perceptual discriminations and neural processes, we found that especially non-alarm screams, and positive screams in particular, seem to have higher efficiency in speeded discriminations and the implicit neural processing of various scream types in humans.


Subject(s)
Auditory Perception/physiology , Discrimination, Psychological/physiology , Fear/psychology , Voice Recognition/physiology , Adult , Auditory Pathways/diagnostic imaging , Auditory Pathways/physiology , Brain/diagnostic imaging , Female , Humans , Magnetic Resonance Imaging , Male , Pattern Recognition, Physiological/physiology , Recognition, Psychology/physiology , Sex Characteristics , Young Adult
7.
Neural Netw ; 133: 40-56, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33125917

ABSTRACT

Conversational sentiment analysis is an emerging, yet challenging subtask of the sentiment analysis problem. It aims to discover the affective state and sentimental change in each person in a conversation based on their opinions. There exists a wealth of interaction information that affects speaker sentiment in conversations. However, existing sentiment analysis approaches are insufficient in dealing with this subtask due to two primary reasons: the lack of benchmark conversational sentiment datasets and the inability to model interactions between individuals. To address these issues, in this paper, we first present a new conversational dataset that we created and made publicly available, named ScenarioSA, to support the development of conversational sentiment analysis models. Then, we investigate how interaction dynamics are associated with conversations and study the multidimensional nature of interactions, which is understandability, credibility and influence. Finally, we propose an interactive long short-term memory (LSTM) network for conversational sentiment analysis to model interactions between speakers in a conversation by (1) adding a confidence gate before each LSTM hidden unit to estimate the credibility of the previous speakers and (2) combining the output gate with the learned influence scores to incorporate the influences of the previous speakers. Extensive experiments are conducted on ScenarioSA and IEMOCAP, and the results show that our model outperforms a wide range of strong baselines and achieves competitive results with the state-of-art approaches.


Subject(s)
Emotions/physiology , Memory, Long-Term/physiology , Memory, Short-Term/physiology , Neural Networks, Computer , Voice Recognition/physiology , Communication , Humans
8.
Rev. Investig. Innov. Cienc. Salud ; 3(2): 98-118, 2021. ilus
Article in Spanish | LILACS, COLNAL | ID: biblio-1392911

ABSTRACT

La acústica forense es una disciplina de la criminalística que ha alcanzado una ma-durez analítica que obliga a que el perito en análisis de voz se especialice en adquirir conocimientos en fonética, tecnologías de sonido, habla, voz, lenguaje, patologías del habla y la voz, así como procesamiento de la señal sonora. Cuando un dictamen deba ser realizado por un profesional de la salud completamente ajeno a la técnica legal, se tropieza con una falta de protocolos, métodos y procedimientos de trabajo que le permitan entregar un informe técnico, válido y validado para la realización de una entrevista y su posterior análisis comparativo de voces, lo que promueve la necesidad de elaborar una ruta o guía metodológica a través de medios académicos físicos o electrónicos para el desarrollo de este conocimiento y su difusión profesional y científica


Forensic acoustics is a criminalistics discipline that has reached an analytical maturity that requires the expert in voice analysis to specialize in acquiring knowledge in pho-netics, sound technologies, speech, voice, language, speech, and voice pathologies, as well as sound signal processing. When an opinion must be made by a health profes-sional completely unrelated to the legal technique, he encounters a lack of protocols, methods, and work procedures that allow him to deliver a technical, valid, and vali-dated report for conducting an interview and its subsequent comparative analysis of voices, which promotes the need to develop a methodological route or guide through physical or electronic academic means for the development of this knowledge and its professional and scientific dissemination


Subject(s)
Speech Recognition Software , Voice Recognition , Voice , Voice Quality/physiology , Speech Recognition Software/standards , Dysarthria , Voice Recognition/physiology
9.
Sci Rep ; 10(1): 19757, 2020 11 12.
Article in English | MEDLINE | ID: mdl-33184411

ABSTRACT

Developmental prosopagnosia (DP) is a condition characterised by lifelong face recognition difficulties. Recent neuroimaging findings suggest that DP may be associated with aberrant structure and function in multimodal regions of cortex implicated in the processing of both facial and vocal identity. These findings suggest that both facial and vocal recognition may be impaired in DP. To test this possibility, we compared the performance of 22 DPs and a group of typical controls, on closely matched tasks that assessed famous face and famous voice recognition ability. As expected, the DPs showed severe impairment on the face recognition task, relative to typical controls. In contrast, however, the DPs and controls identified a similar number of voices. Despite evidence of interactions between facial and vocal processing, these findings suggest some degree of dissociation between the two processing pathways, whereby one can be impaired while the other develops typically. A possible explanation for this dissociation in DP could be that the deficit originates in the early perceptual encoding of face structure, rather than at later, post-perceptual stages of face identity processing, which may be more likely to involve interactions with other modalities.


Subject(s)
Facial Recognition/physiology , Pattern Recognition, Visual , Prosopagnosia/physiopathology , Recognition, Psychology , Visual Perception/physiology , Voice Recognition/physiology , Adult , Female , Humans , Male
SELECTION OF CITATIONS
SEARCH DETAIL
...