Your browser doesn't support javascript.
RENET2: high-performance full-text gene-disease relation extraction with iterative training data expansion.
Su, Junhao; Wu, Ye; Ting, Hing-Fung; Lam, Tak-Wah; Luo, Ruibang.
  • Su J; Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
  • Wu Y; Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
  • Ting HF; Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
  • Lam TW; Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
  • Luo R; Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
NAR Genom Bioinform ; 3(3): lqab062, 2021 Sep.
Article in English | MEDLINE | ID: covidwho-1301371
ABSTRACT
Relation extraction (RE) is a fundamental task for extracting gene-disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts. A few studies have explored extracting gene-disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene-disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene-disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene-disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.

Full text: Available Collection: International databases Database: MEDLINE Type of study: Prognostic study / Reviews Language: English Journal: NAR Genom Bioinform Year: 2021 Document Type: Article Affiliation country: NARGAB

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Prognostic study / Reviews Language: English Journal: NAR Genom Bioinform Year: 2021 Document Type: Article Affiliation country: NARGAB