Search | VHL Regional Portal

LGCCT: A Light Gated and Crossed Complementation Transformer for Multimodal Speech Emotion Recognition.

Liu, Feng; Shen, Si-Yuan; Fu, Zi-Wang; Wang, Han-Yang; Zhou, Ai-Min; Qi, Jia-Yin.

Entropy (Basel) ; 24(7)2022 Jul 21.

Article in English | MEDLINE | ID: mdl-35885233

ABSTRACT

Semantic-rich speech emotion recognition has a high degree of popularity in a range of areas. Speech emotion recognition aims to recognize human emotional states from utterances containing both acoustic and linguistic information. Since both textual and audio patterns play essential roles in speech emotion recognition (SER) tasks, various works have proposed novel modality fusing methods to exploit text and audio signals effectively. However, most of the high performance of existing models is dependent on a great number of learnable parameters, and they can only work well on data with fixed length. Therefore, minimizing computational overhead and improving generalization to unseen data with various lengths while maintaining a certain level of recognition accuracy is an urgent application problem. In this paper, we propose LGCCT, a light gated and crossed complementation transformer for multimodal speech emotion recognition. First, our model is capable of fusing modality information efficiently. Specifically, the acoustic features are extracted by CNN-BiLSTM while the textual features are extracted by BiLSTM. The modality-fused representation is then generated by the cross-attention module. We apply the gate-control mechanism to achieve the balanced integration of the original modality representation and the modality-fused representation. Second, the degree of attention focus can be considered, as the uncertainty and the entropy of the same token should converge to the same value independent of the length. To improve the generalization of the model to various testing-sequence lengths, we adopt the length-scaled dot product to calculate the attention score, which can be interpreted from a theoretical view of entropy. The operation of the length-scaled dot product is cheap but effective. Experiments are conducted on the benchmark dataset CMU-MOSEI. Compared to the baseline models, our model achieves an 81.0% F1 score with only 0.432 M parameters, showing an improvement in the balance between performance and the number of parameters. Moreover, the ablation study signifies the effectiveness of our model and its scalability to various input-sequence lengths, wherein the relative improvement is almost 20% of the baseline without a length-scaled dot product.

SBML2TikZ: supporting the SBML render extension in LaTeX.

Shen, Si Yuan; Bergmann, Frank; Sauro, Herbert M.

Bioinformatics ; 26(21): 2794-5, 2010 Nov 01.

Article in English | MEDLINE | ID: mdl-20829443

ABSTRACT

MOTIVATION: The SBML Render Extension enables coloring and shape information of biochemical models to be stored in the Systems Biology Markup Language (SBML). Rendering of this stored graphical information in a portable and well supported system such as TeX would be useful for researchers preparing documentation and presentations. In addition, since the Render Extension is not yet supported by many applications, it is helpful for such rendering functionality be extended to the more popular CellDesigner annotation as well. RESULTS: SBML2TikZ supports automatic generation of graphics for biochemical models in the popular TeX typesetting system. The library generates a script of TeX macro commands for the vector graphics languages PGF/TikZ that can be compiled into scalable vector graphics described in a model. AVAILABILITY: Source code, documentation and compiled binaries for the SBML2TikZ library can be found at http://www.sbml2tikz.org. In addition, a web application is available at http://www.sys-bio.org/layout

Subject(s)

Models, Biological , Software , Computer Graphics , Databases, Factual , Information Storage and Retrieval , Internet , Systems Biology

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL