Your browser doesn't support javascript.
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.
Lin, Sheng-Jie; Yeh, Wen-Chao; Chiu, Yu-Wen; Chang, Yung-Chun; Hsu, Min-Huei; Chen, Yi-Shin; Hsu, Wen-Lian.
  • Lin SJ; Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Dáan District, Taipei City 106, Taiwan.
  • Yeh WC; Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Guangfu Rd, East District, Hsinchu City 300, Taiwan.
  • Chiu YW; Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Dáan District, Taipei City 106, Taiwan.
  • Chang YC; Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Dáan District, Taipei City 106, Taiwan.
  • Hsu MH; Clinical Big Data Research Center, Taipei Medical University Hospital, No. 172-1, Section 2, Keelung Rd, Dáan District, Taipei City 106, Taiwan.
  • Chen YS; Pervasive AI Research Labs, Ministry of Science and Technology, No. 1001, Daxue Rd, East District, Hsinchu City 300, Taiwan.
  • Hsu WL; Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Dáan District, Taipei City 106, Taiwan.
Database (Oxford) ; 20222022 07 15.
Article in English | MEDLINE | ID: covidwho-1948247
ABSTRACT
In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system's performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F1-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F1-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus. Database URL https//www.ncbi.nlm.nih.gov/research/coronavirus/.
Subject(s)

Full text: Available Collection: International databases Database: MEDLINE Main subject: Data Mining / COVID-19 Type of study: Experimental Studies / Prognostic study / Reviews Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Database

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Data Mining / COVID-19 Type of study: Experimental Studies / Prognostic study / Reviews Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Database