Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
ArXiv ; 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-37744463

RESUMO

Neurophysiology research has demonstrated that it is possible and valuable to investigate sensory processing in scenarios involving continuous sensory streams, such as speech and music. Over the past 10 years or so, novel analytic frameworks combined with the growing participation in data sharing has led to a surge of publicly available datasets involving continuous sensory experiments. However, open science efforts in this domain of research remain scattered, lacking a cohesive set of guidelines. This paper presents an end-to-end open science framework for the storage, analysis, sharing, and re-analysis of neural data recorded during continuous sensory experiments. The framework has been designed to interface easily with existing toolboxes, such as EelBrain, NapLib, MNE, and the mTRF-Toolbox. We present guidelines by taking both the user view (how to rapidly re-analyse existing data) and the experimenter view (how to store, analyse, and share), making the process as straightforward and accessible as possible for all users. Additionally, we introduce a web-based data browser that enables the effortless replication of published results and data re-analysis.

2.
Neuroimage ; 282: 120391, 2023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37757989

RESUMO

There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.


Assuntos
Córtex Auditivo , Percepção da Fala , Humanos , Leitura Labial , Percepção da Fala/fisiologia , Encéfalo/fisiologia , Córtex Auditivo/fisiologia , Fonética , Percepção Visual/fisiologia
3.
bioRxiv ; 2023 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-37662393

RESUMO

Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.

4.
Neuroimage ; 274: 120143, 2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37121375

RESUMO

In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A + V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A + V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.


Assuntos
Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Fala , Atenção/fisiologia , Percepção Visual/fisiologia , Encéfalo/fisiologia , Estimulação Acústica , Percepção Auditiva
5.
Hear Res ; 433: 108767, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37060895

RESUMO

The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.


Assuntos
Córtex Auditivo , Colículos Inferiores , Humanos , Fala/fisiologia , Vias Auditivas/fisiologia , Eletroencefalografia , Córtex Auditivo/fisiologia , Colículos Inferiores/fisiologia , Estimulação Acústica/métodos , Percepção Auditiva/fisiologia
6.
Front Hum Neurosci ; 17: 1283206, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38162285

RESUMO

Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.

7.
J Neurosci ; 42(41): 7782-7798, 2022 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-36041853

RESUMO

In recent years research on natural speech processing has benefited from recognizing that low-frequency cortical activity tracks the amplitude envelope of natural speech. However, it remains unclear to what extent this tracking reflects speech-specific processing beyond the analysis of the stimulus acoustics. In the present study, we aimed to disentangle contributions to cortical envelope tracking that reflect general acoustic processing from those that are functionally related to processing speech. To do so, we recorded EEG from subjects as they listened to auditory chimeras, stimuli composed of the temporal fine structure of one speech stimulus modulated by the amplitude envelope (ENV) of another speech stimulus. By varying the number of frequency bands used in making the chimeras, we obtained some control over which speech stimulus was recognized by the listener. No matter which stimulus was recognized, envelope tracking was always strongest for the ENV stimulus, indicating a dominant contribution from acoustic processing. However, there was also a positive relationship between intelligibility and the tracking of the perceived speech, indicating a contribution from speech-specific processing. These findings were supported by a follow-up analysis that assessed envelope tracking as a function of the (estimated) output of the cochlea rather than the original stimuli used in creating the chimeras. Finally, we sought to isolate the speech-specific contribution to envelope tracking using forward encoding models and found that indices of phonetic feature processing tracked reliably with intelligibility. Together these results show that cortical speech tracking is dominated by acoustic processing but also reflects speech-specific processing.SIGNIFICANCE STATEMENT Activity in auditory cortex is known to dynamically track the energy fluctuations, or amplitude envelope, of speech. Measures of this tracking are now widely used in research on hearing and language and have had a substantial influence on theories of how auditory cortex parses and processes speech. But how much of this speech tracking is actually driven by speech-specific processing rather than general acoustic processing is unclear, limiting its interpretability and its usefulness. Here, by merging two speech stimuli together to form so-called auditory chimeras, we show that EEG tracking of the speech envelope is dominated by acoustic processing but also reflects linguistic analysis. This has important implications for theories of cortical speech tracking and for using measures of that tracking in applied research.


Assuntos
Córtex Auditivo , Percepção da Fala , Humanos , Fala , Estimulação Acústica/métodos , Fonética
8.
Eur J Neurosci ; 56(8): 5201-5214, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35993240

RESUMO

Speech comprehension relies on the ability to understand words within a coherent context. Recent studies have attempted to obtain electrophysiological indices of this process by modelling how brain activity is affected by a word's semantic dissimilarity to preceding words. Although the resulting indices appear robust and are strongly modulated by attention, it remains possible that, rather than capturing the contextual understanding of words, they may actually reflect word-to-word changes in semantic content without the need for a narrative-level understanding on the part of the listener. To test this, we recorded electroencephalography from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This manipulation affected the ability of subjects to comprehend the speech narrative but not the ability to recognise individual words. Neural indices of semantic understanding and low-level acoustic processing were derived for each scrambling condition using the temporal response function. Signatures of semantic processing were observed when speech was unscrambled or minimally scrambled and subjects understood the speech. The same markers were absent for higher scrambling levels as speech comprehension dropped. In contrast, word recognition remained high and neural measures related to envelope tracking did not vary significantly across scrambling conditions. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener's understanding of those words relative to that context. It also highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.


Assuntos
Semântica , Percepção da Fala , Percepção Auditiva/fisiologia , Compreensão/fisiologia , Eletroencefalografia , Humanos , Fala/fisiologia , Percepção da Fala/fisiologia
9.
J Neurosci ; 42(4): 682-691, 2022 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-34893546

RESUMO

Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.


Assuntos
Estimulação Acústica/métodos , Atenção/fisiologia , Fonética , Percepção da Fala/fisiologia , Fala/fisiologia , Adulto , Eletroencefalografia/métodos , Feminino , Humanos , Masculino , Estimulação Luminosa/métodos , Adulto Jovem
10.
Front Neurosci ; 15: 705621, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34880719

RESUMO

Cognitive neuroscience, in particular research on speech and language, has seen an increase in the use of linear modeling techniques for studying the processing of natural, environmental stimuli. The availability of such computational tools has prompted similar investigations in many clinical domains, facilitating the study of cognitive and sensory deficits under more naturalistic conditions. However, studying clinical (and often highly heterogeneous) cohorts introduces an added layer of complexity to such modeling procedures, potentially leading to instability of such techniques and, as a result, inconsistent findings. Here, we outline some key methodological considerations for applied research, referring to a hypothetical clinical experiment involving speech processing and worked examples of simulated electrophysiological (EEG) data. In particular, we focus on experimental design, data preprocessing, stimulus feature extraction, model design, model training and evaluation, and interpretation of model weights. Throughout the paper, we demonstrate the implementation of each step in MATLAB using the mTRF-Toolbox and discuss how to address issues that could arise in applied research. In doing so, we hope to provide better intuition on these more technical points and provide a resource for applied and clinical researchers investigating sensory and cognitive processing using ecologically rich stimuli.

11.
PLoS Comput Biol ; 17(9): e1009358, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34534211

RESUMO

The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.


Assuntos
Percepção Auditiva/fisiologia , Encéfalo/fisiologia , Música , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica/métodos , Adolescente , Adulto , Biologia Computacional , Simulação por Computador , Eletroencefalografia/estatística & dados numéricos , Feminino , Humanos , Modelos Lineares , Masculino , Modelos Neurológicos , Análise de Componente Principal , Acústica da Fala , Adulto Jovem
12.
J Neurosci ; 41(23): 4991-5003, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-33824190

RESUMO

Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.


Assuntos
Encéfalo/fisiologia , Compreensão/fisiologia , Sinais (Psicologia) , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Mapeamento Encefálico , Eletroencefalografia , Feminino , Humanos , Masculino , Fonética , Estimulação Luminosa
13.
J Neurosci ; 41(18): 4100-4119, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33753548

RESUMO

Understanding how and where in the brain sentence-level meaning is constructed from words presents a major scientific challenge. Recent advances have begun to explain brain activation elicited by sentences using vector models of word meaning derived from patterns of word co-occurrence in text corpora. These studies have helped map out semantic representation across a distributed brain network spanning temporal, parietal, and frontal cortex. However, it remains unclear whether activation patterns within regions reflect unified representations of sentence-level meaning, as opposed to superpositions of context-independent component words. This is because models have typically represented sentences as "bags-of-words" that neglect sentence-level structure. To address this issue, we interrogated fMRI activation elicited as 240 sentences were read by 14 participants (9 female, 5 male), using sentences encoded by a recurrent deep artificial neural-network trained on a sentence inference task (InferSent). Recurrent connections and nonlinear filters enable InferSent to transform sequences of word vectors into unified "propositional" sentence representations suitable for evaluating intersentence entailment relations. Using voxelwise encoding modeling, we demonstrate that InferSent predicts elements of fMRI activation that cannot be predicted by bag-of-words models and sentence models using grammatical rules to assemble word vectors. This effect occurs throughout a distributed network, which suggests that propositional sentence-level meaning is represented within and across multiple cortical regions rather than at any single site. In follow-up analyses, we place results in the context of other deep network approaches (ELMo and BERT) and estimate the degree of unpredicted neural signal using an "experiential" semantic model and cross-participant encoding.SIGNIFICANCE STATEMENT A modern-day scientific challenge is to understand how the human brain transforms word sequences into representations of sentence meaning. A recent approach, emerging from advances in functional neuroimaging, big data, and machine learning, is to computationally model meaning, and use models to predict brain activity. Such models have helped map a cortical semantic information-processing network. However, how unified sentence-level information, as opposed to word-level units, is represented throughout this network remains unclear. This is because models have typically represented sentences as unordered "bags-of-words." Using a deep artificial neural network that recurrently and nonlinearly combines word representations into unified propositional sentence representations, we provide evidence that sentence-level information is encoded throughout a cortical network, rather than in a single region.


Assuntos
Córtex Cerebral/diagnóstico por imagem , Córtex Cerebral/fisiologia , Compreensão/fisiologia , Idioma , Redes Neurais de Computação , Semântica , Adulto , Simulação por Computador , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Leitura , Adulto Jovem
14.
Sci Rep ; 11(1): 4963, 2021 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-33654202

RESUMO

Healthy ageing leads to changes in the brain that impact upon sensory and cognitive processing. It is not fully clear how these changes affect the processing of everyday spoken language. Prediction is thought to play an important role in language comprehension, where information about upcoming words is pre-activated across multiple representational levels. However, evidence from electrophysiology suggests differences in how older and younger adults use context-based predictions, particularly at the level of semantic representation. We investigate these differences during natural speech comprehension by presenting older and younger subjects with continuous, narrative speech while recording their electroencephalogram. We use time-lagged linear regression to test how distinct computational measures of (1) semantic dissimilarity and (2) lexical surprisal are processed in the brains of both groups. Our results reveal dissociable neural correlates of these two measures that suggest differences in how younger and older adults successfully comprehend speech. Specifically, our results suggest that, while younger and older subjects both employ context-based lexical predictions, older subjects are significantly less likely to pre-activate the semantic features relating to upcoming words. Furthermore, across our group of older adults, we show that the weaker the neural signature of this semantic pre-activation mechanism, the lower a subject's semantic verbal fluency score. We interpret these findings as prediction playing a generally reduced role at a semantic level in the brains of older listeners during speech comprehension and that these changes may be part of an overall strategy to successfully comprehend speech with reduced cognitive resources.


Assuntos
Envelhecimento/fisiologia , Encéfalo/fisiologia , Compreensão/fisiologia , Eletroencefalografia , Percepção da Fala/fisiologia , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
15.
Front Hum Neurosci ; 14: 130, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32410969

RESUMO

The human auditory system is highly skilled at extracting and processing information from speech in both single-speaker and multi-speaker situations. A commonly studied speech feature is the amplitude envelope which can also be used to determine which speaker a listener is attending to in those multi-speaker situations. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) has shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such processing is strongly reflected in the power of high frequency neural activity (around 70-150 Hz; known as high gamma). The first aim of this study was to determine if high gamma power scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Specifically, linear regression was used to investigate speech envelope and attention decoding in low frequency EEG, high gamma power EEG, and in both EEG signals combined. The second aim was to assess whether the information reflected in high gamma power EEG may be complementary to that reflected in well-established low frequency EEG indices of speech processing. Exploratory analyses were also completed to examine how low frequency and high gamma power EEG may be sensitive to different features of the speech envelope. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in some subjects. This same pattern was true for attention decoding using a separate group of subjects who participated in a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power EEG, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Our results indicated that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects. Combining high gamma power and low frequency EEG can improve the mapping between natural speech and the resulting neural responses.

16.
Neuroimage ; 210: 116558, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31962174

RESUMO

Humans can easily distinguish many sounds in the environment, but speech and music are uniquely important. Previous studies, mostly using fMRI, have identified separate regions of the brain that respond selectively for speech and music. Yet there is little evidence that brain responses are larger and more temporally precise for human-specific sounds like speech and music compared to other types of sounds, as has been found for responses to species-specific sounds in other animals. We recorded EEG as healthy, adult subjects listened to various types of two-second-long natural sounds. By classifying each sound based on the EEG response, we found that speech, music, and impact sounds were classified better than other natural sounds. But unlike impact sounds, the classification accuracy for speech and music dropped for synthesized sounds that have identical frequency and modulation statistics based on a subcortical model, indicating a selectivity for higher-order features in these sounds. Lastly, the patterns in average power and phase consistency of the two-second EEG responses to each sound replicated the patterns of speech and music selectivity observed with classification accuracy. Together with the classification results, this suggests that the brain produces temporally individualized responses to speech and music sounds that are stronger than the responses to other natural sounds. In addition to highlighting the importance of speech and music for the human brain, the techniques used here could be a cost-effective, temporally precise, and efficient way to study the human brain's selectivity for speech and music in other populations.


Assuntos
Percepção Auditiva/fisiologia , Córtex Cerebral/fisiologia , Eletroencefalografia/métodos , Neuroimagem Funcional/métodos , Música , Adulto , Feminino , Humanos , Masculino , Percepção da Fala/fisiologia , Adulto Jovem
17.
Neuroimage ; 205: 116283, 2020 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-31629828

RESUMO

Recently, we showed that in a simple acoustic scene with one sound source, auditory cortex tracks the time-varying location of a continuously moving sound. Specifically, we found that both the delta phase and alpha power of the electroencephalogram (EEG) can be used to reconstruct the sound source azimuth. However, in natural settings, we are often presented with a mixture of multiple competing sounds and so we must focus our attention on the relevant source in order to segregate it from the competing sources e.g. 'cocktail party effect'. While many studies have examined this phenomenon in the context of sound envelope tracking by the cortex, it is unclear how we process and utilize spatial information in complex acoustic scenes with multiple sound sources. To test this, we created an experiment where subjects listened to two concurrent sound stimuli that were moving within the horizontal plane over headphones while we recorded their EEG. Participants were tasked with paying attention to one of the two presented stimuli. The data were analyzed by deriving linear mappings, temporal response functions (TRF), between EEG data and attended as well unattended sound source trajectories. Next, we used these TRFs to reconstruct both trajectories from previously unseen EEG data. In a first experiment we used noise stimuli and included the task involved spatially localizing embedded targets. Then, in a second experiment, we employed speech stimuli and a non-spatial speech comprehension task. Results showed the trajectory of an attended sound source can be reliably reconstructed from both delta phase and alpha power of EEG even in the presence of distracting stimuli. Moreover, the reconstruction was robust to task and stimulus type. The cortical representation of the unattended source position was below detection level for the noise stimuli, but we observed weak tracking of the unattended source location for the speech stimuli by the delta phase of EEG. In addition, we demonstrated that the trajectory reconstruction method can in principle be used to decode selective attention on a single-trial basis, however, its performance was inferior to envelope-based decoders. These results suggest a possible dissociation of delta phase and alpha power of EEG in the context of sound trajectory tracking. Moreover, the demonstrated ability to localize and determine the attended speaker in complex acoustic environments is particularly relevant for cognitively controlled hearing devices.


Assuntos
Ritmo alfa/fisiologia , Atenção/fisiologia , Percepção Auditiva/fisiologia , Córtex Cerebral/fisiologia , Ritmo Delta/fisiologia , Eletroencefalografia , Percepção Espacial/fisiologia , Adulto , Feminino , Humanos , Masculino , Localização de Som/fisiologia , Percepção da Fala/fisiologia , Adulto Jovem
18.
J Neurosci ; 39(45): 8969-8987, 2019 11 06.
Artigo em Inglês | MEDLINE | ID: mdl-31570538

RESUMO

The brain is thought to combine linguistic knowledge of words and nonlinguistic knowledge of their referents to encode sentence meaning. However, functional neuroimaging studies aiming at decoding language meaning from neural activity have mostly relied on distributional models of word semantics, which are based on patterns of word co-occurrence in text corpora. Here, we present initial evidence that modeling nonlinguistic "experiential" knowledge contributes to decoding neural representations of sentence meaning. We model attributes of peoples' sensory, motor, social, emotional, and cognitive experiences with words using behavioral ratings. We demonstrate that fMRI activation elicited in sentence reading is more accurately decoded when this experiential attribute model is integrated with a text-based model than when either model is applied in isolation (participants were 5 males and 9 females). Our decoding approach exploits a representation-similarity-based framework, which benefits from being parameter free, while performing at accuracy levels comparable with those from parameter fitting approaches, such as ridge regression. We find that the text-based model contributes particularly to the decoding of sentences containing linguistically oriented "abstract" words and reveal tentative evidence that the experiential model improves decoding of more concrete sentences. Finally, we introduce a cross-participant decoding method to estimate an upper bound on model-based decoding accuracy. We demonstrate that a substantial fraction of neural signal remains unexplained, and leverage this gap to pinpoint characteristics of weakly decoded sentences and hence identify model weaknesses to guide future model development.SIGNIFICANCE STATEMENT Language gives humans the unique ability to communicate about historical events, theoretical concepts, and fiction. Although words are learned through language and defined by their relations to other words in dictionaries, our understanding of word meaning presumably draws heavily on our nonlinguistic sensory, motor, interoceptive, and emotional experiences with words and their referents. Behavioral experiments lend support to the intuition that word meaning integrates aspects of linguistic and nonlinguistic "experiential" knowledge. However, behavioral measures do not provide a window on how meaning is represented in the brain and tend to necessitate artificial experimental paradigms. We present a model-based approach that reveals early evidence that experiential and linguistically acquired knowledge can be detected in brain activity elicited in reading natural sentences.


Assuntos
Compreensão , Modelos Neurológicos , Leitura , Adulto , Encéfalo/fisiologia , Feminino , Humanos , Conhecimento , Aprendizagem , Masculino , Semântica
19.
J Neurosci ; 39(38): 7564-7575, 2019 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-31371424

RESUMO

Speech perception involves the integration of sensory input with expectations based on the context of that speech. Much debate surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can be best explained as a purely feedforward process. Although there has been compelling evidence on both sides of this debate, experiments involving naturalistic speech stimuli to address these questions have been lacking. Here, we use a recently introduced method for quantifying the semantic context of speech and relate it to a commonly used method for indexing low-level auditory encoding of speech. The relationship between these measures is taken to be an indication of how semantic context leading up to a word influences how its low-level acoustic and phonetic features are processed. We record EEG from human participants (both male and female) listening to continuous natural speech and find that the early cortical tracking of a word's speech envelope is enhanced by its semantic similarity to its sentential context. Using a forward modeling approach, we find that prediction accuracy of the EEG signal also shows the same effect. Furthermore, this effect shows distinct temporal patterns of correlation depending on the type of speech input representation (acoustic or phonological) used for the model, implicating a top-down propagation of information through the processing hierarchy. These results suggest a mechanism that links top-down prior information with the early cortical entrainment of words in natural, continuous speech.SIGNIFICANCE STATEMENT During natural speech comprehension, we use semantic context when processing information about new incoming words. However, precisely how the neural processing of bottom-up sensory information is affected by top-down context-based predictions remains controversial. We address this discussion using a novel approach that indexes a word's similarity to context and how well a word's acoustic and phonetic features are processed by the brain at the time of its utterance. We relate these two measures and show that lower-level auditory tracking of speech improves for words that are more related to their preceding context. These results suggest a mechanism that links top-down prior information with bottom-up sensory processing in the context of natural, narrative speech listening.


Assuntos
Encéfalo/fisiologia , Compreensão/fisiologia , Modelos Neurológicos , Semântica , Percepção da Fala/fisiologia , Adulto , Eletroencefalografia , Feminino , Humanos , Masculino , Adulto Jovem
20.
Eur J Neurosci ; 50(11): 3831-3842, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31287601

RESUMO

Speech is central to communication among humans. Meaning is largely conveyed by the selection of linguistic units such as words, phrases and sentences. However, prosody, that is the variation of acoustic cues that tie linguistic segments together, adds another layer of meaning. There are various features underlying prosody, one of the most important being pitch and how it is modulated. Recent fMRI and ECoG studies have suggested that there are cortical regions for pitch which respond primarily to resolved harmonics and that high-gamma cortical activity encodes intonation as represented by relative pitch. Importantly, this latter result was shown to be independent of the cortical tracking of the acoustic energy of speech, a commonly used measure. Here, we investigate whether we can isolate low-frequency EEG indices of pitch processing of continuous narrative speech from those reflecting the tracking of other acoustic and phonetic features. Harmonic resolvability was found to contain unique predictive power in delta and theta phase, but it was highly correlated with the envelope and tracked even when stimuli were pitch-impoverished. As such, we are circumspect about whether its contribution is truly pitch-specific. Crucially however, we found a unique contribution of relative pitch to EEG delta-phase prediction, and this tracking was absent when subjects listened to pitch-impoverished stimuli. This finding suggests the possibility of a separate processing stream for prosody that might operate in parallel to acoustic-linguistic processing. Furthermore, it provides a novel neural index that could be useful for testing prosodic encoding in populations with speech processing deficits and for improving cognitively controlled hearing aids.


Assuntos
Córtex Auditivo/fisiologia , Ritmo Delta/fisiologia , Fonética , Percepção da Altura Sonora/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica/métodos , Eletroencefalografia/métodos , Feminino , Humanos , Magnetoencefalografia/métodos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...