Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37624721

RESUMO

Speech emotion recognition (SER) plays an important role in human-computer interaction, which can provide better interactivity to enhance user experiences. Existing approaches tend to directly apply deep learning networks to distinguish emotions. Among them, the convolutional neural network (CNN) is the most commonly used method to learn emotional representations from spectrograms. However, CNN does not explicitly model features' associations in the spectral-, temporal-, and channel-wise axes or their relative relevance, which will limit the representation learning. In this article, we propose a deep spectro-temporal-channel network (DSTCNet) to improve the representational ability for speech emotion. The proposed DSTCNet integrates several spectro-temporal-channel (STC) attention modules into a general CNN. Specifically, we propose the STC module that infers a 3-D attention map along the dimensions of time, frequency, and channel. The STC attention can focus more on the regions of crucial time frames, frequency ranges, and feature channels. Finally, experiments were conducted on the Berlin emotional database (EmoDB) and interactive emotional dyadic motion capture (IEMOCAP) databases. The results reveal that our DSTCNet can outperform the traditional CNN-based and several state-of-the-art methods.

2.
Cereb Cortex ; 33(13): 8620-8632, 2023 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-37118893

RESUMO

Sentence oral reading requires not only a coordinated effort in the visual, articulatory, and cognitive processes but also supposes a top-down influence from linguistic knowledge onto the visual-motor behavior. Despite a gradual recognition of a predictive coding effect in this process, there is currently a lack of a comprehensive demonstration regarding the time-varying brain dynamics that underlines the oral reading strategy. To address this, our study used a multimodal approach, combining real-time recording of electroencephalography, eye movements, and speech, with a comprehensive examination of regional, inter-regional, sub-network, and whole-brain responses. Our study identified the top-down predictive effect with a phrase-grouping phenomenon in the fixation interval and eye-voice span. This effect was associated with the delta and theta band synchronization in the prefrontal, anterior temporal, and inferior frontal lobes. We also observed early activation of the cognitive control network and its recurrent interactions with the visual-motor networks structurally at the phrase rate. Finally, our study emphasizes the importance of cross-frequency coupling as a promising neural realization of hierarchical sentence structuring and calls for further investigation.


Assuntos
Idioma , Leitura , Eletroencefalografia , Encéfalo/fisiologia , Linguística
3.
J Neural Eng ; 20(1)2023 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-36720164

RESUMO

Objective.Constructing an efficient human emotion recognition model based on electroencephalogram (EEG) signals is significant for realizing emotional brain-computer interaction and improving machine intelligence.Approach.In this paper, we present a spatial-temporal feature fused convolutional graph attention network (STFCGAT) model based on multi-channel EEG signals for human emotion recognition. First, we combined the single-channel differential entropy (DE) feature with the cross-channel functional connectivity (FC) feature to extract both the temporal variation and spatial topological information of EEG. After that, a novel convolutional graph attention network was used to fuse the DE and FC features and further extract higher-level graph structural information with sufficient expressive power for emotion recognition. Furthermore, we introduced a multi-headed attention mechanism in graph neural networks to improve the generalization ability of the model.Main results.We evaluated the emotion recognition performance of our proposed model on the public SEED and DEAP datasets, which achieved a classification accuracy of 99.11% ± 0.83% and 94.83% ± 3.41% in the subject-dependent and subject-independent experiments on the SEED dataset, and achieved an accuracy of 91.19% ± 1.24% and 92.03% ± 4.57% for discrimination of arousal and valence in subject-independent experiments on DEAP dataset. Notably, our model achieved state-of-the-art performance on cross-subject emotion recognition tasks for both datasets. In addition, we gained insight into the proposed frame through both the ablation experiments and the analysis of spatial patterns of FC and DE features.Significance.All these results prove the effectiveness of the STFCGAT architecture for emotion recognition and also indicate that there are significant differences in the spatial-temporal characteristics of the brain under different emotional states.


Assuntos
Emoções , Reconhecimento Psicológico , Humanos , Encéfalo , Eletroencefalografia , Inteligência Artificial
4.
IEEE Trans Cybern ; 53(7): 4306-4319, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35486568

RESUMO

Network embedding which aims to learn a low dimensional representation of nodes is a powerful technique for network analysis. While network embedding for networks with complete attributes has been widely investigated, in many real-world applications the attributes of partial nodes are unobserved (i.e., missing) due to privacy concern or resource limit. Very recently, several network embedding methods have been proposed for attribute-missing networks. They first complete the missing attributes and then use the complemented network to learn network embedding. The parameters of these two processes cannot be adjusted by each other, resulting in compromised results. To address this problem, we propose a unified model in which the process of completing missing attributes and the process of learning embedding are not separated but closely intertwined. Being specific, completing missing attributes is under the guidance of learning network representation via mutual information maximization, and the complemented attributes directly enter network representation module which will generate further feedback for completing missing attributes. We further impose attribute-structure relationship constraint for completing missing attributes by designing a new generative adversarial networks (GANs) model. To the best of our knowledge, this is the first unified model for attribute-missing network embedding. Empirical results on real-world datasets show the superiority of our new method over other state-of-the-art methods on four network analysis tasks, including node classification, node clustering, link prediction, and network visualization.


Assuntos
Aprendizagem , Análise por Conglomerados
5.
IEEE Trans Cybern ; 52(3): 1364-1376, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32356771

RESUMO

Spikes are the currency in central nervous systems for information transmission and processing. They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing. However, efficient processing and learning of discrete spikes still remain a challenging problem. In this article, we make our contributions toward this direction. A simplified spiking neuron model is first introduced with the effects of both synaptic input and firing output on the membrane potential being modeled with an impulse function. An event-driven scheme is then presented to further improve the processing efficiency. Based on the neuron model, we propose two new multispike learning rules which demonstrate better performance over other baselines on various tasks, including association, classification, and feature detection. In addition to efficiency, our learning rules demonstrate high robustness against the strong noise of different types. They can also be generalized to different spike coding schemes for the classification task, and notably, the single neuron is capable of solving multicategory classifications with our learning rules. In the feature detection task, we re-examine the ability of unsupervised spike-timing-dependent plasticity with its limitations being presented, and find a new phenomenon of losing selectivity. In contrast, our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied. Moreover, our rules cannot only detect features but also discriminate them. The improved performance of our methods would contribute to neuromorphic computing as a preferable choice.


Assuntos
Redes Neurais de Computação , Neurônios , Aprendizagem , Neurônios/fisiologia
6.
IEEE Trans Neural Netw Learn Syst ; 32(2): 625-638, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32203038

RESUMO

The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task. In this article, we propose a spike-based framework from a more brain-like perspective for the ESR task. Our framework is a unifying system with consistent integration of three major functional parts which are sparse encoding, efficient learning, and robust readout. We first introduce a simple sparse encoding, where key points are used for feature representation, and demonstrate its generalization to both spike- and nonspike-based systems. Then, we evaluate the learning properties of different learning rules in detail with our contributions being added for improvements. Our results highlight the advantages of multispike learning, providing a selection reference for various spike-based developments. Finally, we combine the multispike readout with the other parts to form a system for ESR. Experimental results show that our framework performs the best as compared to other baseline approaches. In addition, we show that our spike-based framework has several advantageous characteristics including early decision making, small dataset acquiring, and ongoing dynamic processing. Our framework is the first attempt to apply the multispike characteristic of nervous neurons to ESR. The outstanding performance of our approach would potentially contribute to draw more research efforts to push the boundaries of spike-based paradigm to a new horizon.


Assuntos
Meio Ambiente , Aprendizado de Máquina , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Som , Algoritmos , Encéfalo/fisiologia , Humanos , Modelos Neurológicos , Neurônios
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...