RENET2: high-performance full-text gene-disease relation extraction with iterative training data expansion.
NAR Genom Bioinform
; 3(3): lqab062, 2021 Sep.
Article
in English
| MEDLINE | ID: covidwho-1301371
ABSTRACT
Relation extraction (RE) is a fundamental task for extracting gene-disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts. A few studies have explored extracting gene-disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene-disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) â¼1.89M full-text articles from PubMed Central and found â¼3.72M gene-disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene-disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.
Full text:
Available
Collection:
International databases
Database:
MEDLINE
Type of study:
Prognostic study
/
Reviews
Language:
English
Journal:
NAR Genom Bioinform
Year:
2021
Document Type:
Article
Affiliation country:
NARGAB
Similar
MEDLINE
...
LILACS
LIS