Search | VHL Regional Portal

Enhancing Generalizability in Biomedical Entity Recognition: Self-Attention PCA-CLS Model.

Mundotiya, Rajesh Kumar; Priya, Juhi; Kuwarbi, Divya; Singh, Teekam.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Jul 16.

Article in English | MEDLINE | ID: mdl-39012749

ABSTRACT

One of the primary tasks in the early stages of data mining involves the identification of entities from biomedical corpora. Traditional approaches relying on robust feature engineering face challenges when learning from available (un-)annotated data using data-driven models like deep learning-based architectures. Despite leveraging large corpora and advanced deep learning models, domain generalization remains an issue. Attention mechanisms are effective in capturing longer sentence dependencies and extracting semantic and syntactic information from limited annotated datasets. To address out-of-vocabulary challenges in biomedical text, the PCA-CLS (Position and Contextual Attention with CNN-LSTM-Softmax) model combines global self-attention and character-level convolutional neural network techniques. The model's performance is evaluated on eight distinct biomedical domain datasets encompassing entities such as genes, drugs, diseases, and species. The PCA-CLS model outperforms several state-of-the-art models, achieving notable F1-scores, including 88.19% on BC2GM, 85.44% on JNLPBA, 90.80% on BC5CDR-chemical, 87.07% on BC5CDR-disease, 89.18% on BC4CHEMD, 88.81% on NCBI, and 91.59% on the s800 dataset.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL