Search | VHL Regional Portal

Robust neural tracking of linguistic speech representations using a convolutional neural network.

Puffay, Corentin; Vanthornhout, Jonas; Gillis, Marlies; Accou, Bernd; Van Hamme, Hugo; Francart, Tom.

J Neural Eng ; 20(4)2023 08 30.

Article in English | MEDLINE | ID: mdl-37595606

ABSTRACT

Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.

Subject(s)

Neurons , Speech , Humans , Cochlear Nerve , Linguistics , Neural Networks, Computer

Relating EEG to continuous speech using deep neural networks: a review.

Puffay, Corentin; Accou, Bernd; Bollens, Lies; Monesi, Mohammad Jalilpour; Vanthornhout, Jonas; Van Hamme, Hugo; Francart, Tom.

J Neural Eng ; 20(4)2023 08 03.

Article in English | MEDLINE | ID: mdl-37442115

ABSTRACT

Objective.When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech.Approach.This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in single- or multiple-speakers paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis.Main results.We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task.Significance.We present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding.

Subject(s)

Electroencephalography , Speech , Humans , Speech/physiology , Electroencephalography/methods , Neural Networks, Computer , Brain/physiology , Auditory Perception/physiology

Decoding of the speech envelope from EEG using the VLAAI deep neural network.

Accou, Bernd; Vanthornhout, Jonas; Hamme, Hugo Van; Francart, Tom.

Sci Rep ; 13(1): 812, 2023 01 16.

Article in English | MEDLINE | ID: mdl-36646740

ABSTRACT

To investigate the processing of speech in the brain, commonly simple linear models are used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly-dynamic, complex non-linear system like the brain, and they often require a substantial amount of subject-specific training data. This work introduces a novel speech decoder architecture: the Very Large Augmented Auditory Inference (VLAAI) network. The VLAAI network outperformed state-of-the-art subject-independent models (median Pearson correlation of 0.19, p < 0.001), yielding an increase over the well-established linear model by 52%. Using ablation techniques, we identified the relative importance of each part of the VLAAI network and found that the non-linear components and output context module influenced model performance the most (10% relative performance increase). Subsequently, the VLAAI network was evaluated on a holdout dataset of 26 subjects and a publicly available unseen dataset to test generalization for unseen subjects and stimuli. No significant difference was found between the default test and the holdout subjects, and between the default test set and the public dataset. The VLAAI network also significantly outperformed all baseline models on the public dataset. We evaluated the effect of training set size by training the VLAAI network on data from 1 up to 80 subjects and evaluated on 26 holdout subjects, revealing a relationship following a hyperbolic tangent function between the number of subjects in the training set and the performance on unseen subjects. Finally, the subject-independent VLAAI network was finetuned for 26 holdout subjects to obtain subject-specific VLAAI models. With 5 minutes of data or more, a significant performance improvement was found, up to 34% (from 0.18 to 0.25 median Pearson correlation) with regards to the subject-independent VLAAI network.

Subject(s)

Electroencephalography , Speech , Humans , Electroencephalography/methods , Neural Networks, Computer , Brain , Head

Predicting speech intelligibility from EEG in a non-linear classification paradigm.

Accou, Bernd; Jalilpour Monesi, Mohammad; Van Hamme, Hugo; Francart, Tom.

J Neural Eng ; 18(6)2021 11 15.

Article in English | MEDLINE | ID: mdl-34706347

ABSTRACT

Objective.Currently, only behavioral speech understanding tests are available, which require active participation of the person being tested. As this is infeasible for certain populations, an objective measure of speech intelligibility is required. Recently, brain imaging data has been used to establish a relationship between stimulus and brain response. Linear models have been successfully linked to speech intelligibility but require per-subject training. We present a deep-learning-based model incorporating dilated convolutions that operates in a match/mismatch paradigm. The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training.Approach.We evaluated the performance of the model as a function of input segment length, electroencephalography (EEG) frequency band and receptive field size while comparing it to multiple baseline models. Next, we evaluated performance on held-out data and finetuning. Finally, we established a link between the accuracy of our model and the state-of-the-art behavioral MATRIX test.Main results.The dilated convolutional model significantly outperformed the baseline models for every input segment length, for all EEG frequency bands except the delta and theta band, and receptive field sizes between 250 and 500 ms. Additionally, finetuning significantly increased the accuracy on a held-out dataset. Finally, a significant correlation (r= 0.59,p= 0.0154) was found between the speech reception threshold (SRT) estimated using the behavioral MATRIX test and our objective method.Significance.Our method is the first to predict the SRT from EEG for unseen subjects, contributing to objective measures of speech intelligibility.

Subject(s)

Speech Intelligibility , Speech Perception , Acoustic Stimulation , Brain , Electroencephalography/methods , Hearing/physiology , Humans , Speech Intelligibility/physiology , Speech Perception/physiology

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL