Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Interdiscip Sci ; 2024 Feb 10.
Article in English | MEDLINE | ID: mdl-38340264

ABSTRACT

We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.

2.
Hum Mutat ; 42(4): 359-372, 2021 04.
Article in English | MEDLINE | ID: mdl-33565189

ABSTRACT

Cancer is one of the most important health issues globally and the accuracy of interpretation of cancer-related variants is critical for the clinical management of hereditary cancer. ClinGen Sequence Variant Interpretation Working Groups have developed many adaptations of American College of Medical Genetics and Genomics and the Association of Molecular Pathologists guidelines to improve the consistency of interpretation. We combined the most recent adaptations to expand the number of the criteria from 28 to 48 and developed a tool called Cancer SIGVAR to help genetic counselors interpret the clinical significance of cancer germline variants. Our tool can accept VCF files as input and realize fully automated interpretation based on 21 criteria and semiautomated interpretation based on 48 criteria. We validated the performance of our tool with the ClinVar and CLINVITAE benchmark databases, achieving an average consistency for pathogenic and benign assessment up to 93.71% and 79.38%, respectively. We compared Cancer SIGVAR with two similar tools, InterVar and PathoMAN, and analyzed the main differences in criteria and implementation. Furthermore, we selected 911 variants from another two in-house benchmark databases, and semiautomated interpretation reached an average classification consistency of 98.35%. Our findings highlight the need to optimize automated interpretation tools based on constantly updated guidelines. Cancer SIGVAR is publicly available at http://cancersigvar.bgi.com/.


Subject(s)
Genetic Predisposition to Disease , Neoplasms , Genetic Testing , Genetic Variation , Genome, Human , Germ Cells , Humans , Neoplasms/genetics , Software , United States
3.
Front Microbiol ; 9: 2658, 2018.
Article in English | MEDLINE | ID: mdl-30467498

ABSTRACT

Coniothyrium minitans is a sclerotial parasite, which has been investigated for commercial control of crop diseases caused by Sclerotinia sclerotiorum. Previously, we obtained a T-DNA insertional mutant, ZS-1TN24363, which did not produce melanin during conidiation. To understand the function of melanin in C. minitans, we cloned the gene that was disrupted by the T-DNA insertion, and found that this gene, called CmMR1, encoded a putative protein of 1,011 amino acids, which is a homolog of the transcription factor MR. Full-length CmMR1 contains 3,167 bp, with three exons and two introns. To confirm that the disrupted gene is responsible for the melanin-deficiency of the mutant, CmMR1 was disrupted and three targeted knockout mutants were obtained. Biological assays showed that the phenotype of the targeted knockout mutants was similar to that of the T-DNA insertional mutant. Furthermore, gene complementation confirmed that CmMR1 is responsible for the mutant phenotype. CmMR1 disruption did not affect hyphal growth, conidiation, and parasitization of C. minitans, however, the ROS accumulation increased and tolerance to UV light decreased significantly in the mutants. Our result may enhance the understanding of melanin in the ecology of C. minitans on molecular level.

SELECTION OF CITATIONS
SEARCH DETAIL
...