Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
J Neural Eng ; 21(1)2024 01 11.
Article in English | MEDLINE | ID: mdl-38205849

ABSTRACT

Objective. To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal.Approach. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory based model to determine which of two candidate speech segments matches with a given EEG segment.Main results. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets exhibit a more consistent representation in EEG compared to syllable onsets.Significance. Finally, our findings suggest that neural tracking previously thought to be associated with BPCs might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.


Subject(s)
Electroencephalography , Speech , Humans , Linear Models
2.
J Neural Eng ; 20(4)2023 08 30.
Article in English | MEDLINE | ID: mdl-37595606

ABSTRACT

Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.


Subject(s)
Neurons , Speech , Humans , Cochlear Nerve , Linguistics , Neural Networks, Computer
3.
J Neural Eng ; 20(4)2023 08 03.
Article in English | MEDLINE | ID: mdl-37442115

ABSTRACT

Objective.When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech.Approach.This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in single- or multiple-speakers paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis.Main results.We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task.Significance.We present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding.


Subject(s)
Electroencephalography , Speech , Humans , Speech/physiology , Electroencephalography/methods , Neural Networks, Computer , Brain/physiology , Auditory Perception/physiology
4.
J Neural Eng ; 18(6)2021 11 15.
Article in English | MEDLINE | ID: mdl-34706347

ABSTRACT

Objective.Currently, only behavioral speech understanding tests are available, which require active participation of the person being tested. As this is infeasible for certain populations, an objective measure of speech intelligibility is required. Recently, brain imaging data has been used to establish a relationship between stimulus and brain response. Linear models have been successfully linked to speech intelligibility but require per-subject training. We present a deep-learning-based model incorporating dilated convolutions that operates in a match/mismatch paradigm. The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training.Approach.We evaluated the performance of the model as a function of input segment length, electroencephalography (EEG) frequency band and receptive field size while comparing it to multiple baseline models. Next, we evaluated performance on held-out data and finetuning. Finally, we established a link between the accuracy of our model and the state-of-the-art behavioral MATRIX test.Main results.The dilated convolutional model significantly outperformed the baseline models for every input segment length, for all EEG frequency bands except the delta and theta band, and receptive field sizes between 250 and 500 ms. Additionally, finetuning significantly increased the accuracy on a held-out dataset. Finally, a significant correlation (r= 0.59,p= 0.0154) was found between the speech reception threshold (SRT) estimated using the behavioral MATRIX test and our objective method.Significance.Our method is the first to predict the SRT from EEG for unseen subjects, contributing to objective measures of speech intelligibility.


Subject(s)
Speech Intelligibility , Speech Perception , Acoustic Stimulation , Brain , Electroencephalography/methods , Hearing/physiology , Humans , Speech Intelligibility/physiology , Speech Perception/physiology
5.
J Neural Eng ; 17(4): 046039, 2020 08 19.
Article in English | MEDLINE | ID: mdl-32679578

ABSTRACT

OBJECTIVE: A hearing aid's noise reduction algorithm cannot infer to which speaker the user intends to listen to. Auditory attention decoding (AAD) algorithms allow to infer this information from neural signals, which leads to the concept of neuro-steered hearing aids. We aim to evaluate and demonstrate the feasibility of AAD-supported speech enhancement in challenging noisy conditions based on electroencephalography recordings. APPROACH: The AAD performance with a linear versus a deep neural network (DNN) based speaker separation was evaluated for same-gender speaker mixtures using three different speaker positions and three different noise conditions. MAIN RESULTS: AAD results based on the linear approach were found to be at least on par and sometimes even better than pure DNN-based approaches in terms of AAD accuracy in all tested conditions. However, when using the DNN to support a linear data-driven beamformer, a performance improvement over the purely linear approach was obtained in the most challenging scenarios. The use of multiple microphones was also found to improve speaker separation and AAD performance over single-microphone systems. SIGNIFICANCE: Recent proof-of-concept studies in this context each focus on a different method in a different experimental setting, which makes it hard to compare them. Furthermore, they are tested in highly idealized experimental conditions, which are still far from a realistic hearing aid setting. This work provides a systematic comparison of a linear and non-linear neuro-steered speech enhancement model, as well as a more realistic validation in challenging conditions.


Subject(s)
Deep Learning , Speech Perception , Acoustic Stimulation , Attention , Electroencephalography , Speech
7.
Article in English | MEDLINE | ID: mdl-26737406

ABSTRACT

This work examines the use of a Wireless Acoustic Sensor Network (WASN) for the classification of clinically relevant activities of daily living (ADL) of elderly people. The aim of this research is to automatically compile a summary report about the performed ADLs which can be easily interpreted by caregivers. In this work, the classification performance of the WASN will be evaluated in both clean and noisy conditions. Results indicate that the classification performance of the WASN is 75.3±4.3% on clean acoustic data selected from the node receiving with the highest SNR. By incorporating spatial information extracted by the WASN, the classification accuracy further increases to 78.6±1.4%. In addition, the classification performance of the WASN in noisy conditions is in absolute average 8.1% to 9.0% more accurate compared to highest obtained single microphone results.


Subject(s)
Acoustics/instrumentation , Activities of Daily Living , Monitoring, Ambulatory/methods , Wireless Technology , Aged , Caregivers , Humans , Monitoring, Ambulatory/instrumentation , Signal Processing, Computer-Assisted , Signal-To-Noise Ratio
SELECTION OF CITATIONS
SEARCH DETAIL
...