Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
1.
Article in English | MEDLINE | ID: mdl-25183856

ABSTRACT

This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy approach was applied to associate genes with sentences. For subtask B, we designed two types of systems: (i) search-based systems, which predict GO terms based on existing annotations for GOESs that are of different textual granularities (i.e., full-text articles, abstracts, and sentences) using state-of-the-art information retrieval techniques (i.e., a novel application of the idea of distant supervision) and (ii) a similarity-based system, which assigns GO terms based on the distance between words in sentences and GO terms/synonyms. Our best performing system for subtask A achieves an F1 score of 0.27 based on exact match and 0.387 allowing relaxed overlap match. Our best performing system for subtask B, a search-based system, achieves an F1 score of 0.075 based on exact match and 0.301 considering hierarchical matches. Our search-based systems for subtask B significantly outperformed the similarity-based system. DATABASE URL: https://github.com/noname2020/Bioc.


Subject(s)
Computational Biology/methods , Databases, Genetic , Gene Ontology , Information Storage and Retrieval/methods , Molecular Sequence Annotation/methods
2.
J Biomed Inform ; 49: 275-81, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24680983

ABSTRACT

In light of the heightened problems of polysemy, synonymy, and hyponymy in clinical text, we hypothesize that patient cohort identification can be improved by using a large, in-domain clinical corpus for query expansion. We evaluate the utility of four auxiliary collections for the Text REtrieval Conference task of IR-based cohort retrieval, considering the effects of collection size, the inherent difficulty of a query, and the interaction between the collections. Each collection was applied to aid in cohort retrieval from the Pittsburgh NLP Repository by using a mixture of relevance models. Measured by mean average precision, performance using any auxiliary resource (MAP=0.386 and above) is shown to improve over the baseline query likelihood model (MAP=0.373). Considering subsets of the Mayo Clinic collection, we found that after including 2.5 billion term instances, retrieval is not improved by adding more instances. However, adding the Mayo Clinic collection did improve performance significantly over any existing setup, with a system using all four auxiliary collections obtaining the best results (MAP=0.4223). Because optimal results in the mixture of relevance models would require selective sampling of the collections, the common sense approach of "use all available data" is inappropriate. However, we found that it was still beneficial to add the Mayo corpus to any mixture of relevance models. On the task of IR-based cohort identification, query expansion with the Mayo Clinic corpus resulted in consistent and significant improvements. As such, any IR query expansion with access to a large clinical corpus could benefit from the additional resource. Additionally, we have shown that more data is not necessarily better, implying that there is value in collection curation.


Subject(s)
Electronic Health Records , Information Storage and Retrieval , Cohort Studies , Likelihood Functions
3.
Database (Oxford) ; 2013: bas056, 2013.
Article in English | MEDLINE | ID: mdl-23327936

ABSTRACT

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.


Subject(s)
Data Mining , Education , Databases as Topic , Documentation , Humans , Software , Time Factors
4.
Obstet Gynecol ; 121(1): 115-21, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23262935

ABSTRACT

OBJECTIVE: To examine the independent contribution of risk factors developing during pregnancy to subsequent risk of obesity in young children. METHODS: We conducted a historical cohort study using data from electronic medical records of mothers and their 3,302 singleton offspring born between 2004 and 2007 at a community-based obstetric facility who attended a 4-year well visit at a pediatric practice network. The child's body mass index (BMI) z score at age 4 years was studied in relation to the mother's gestational weight gain, gestational diabetes mellitus, gestational hypertension or preeclampsia, and prenatal tobacco use. Institute of Medicine categories defined excess and inadequate gestational weight gain at term. Analysis of variance and multiple linear regression were used to test their independent relation to BMI. RESULTS: Mothers were white (39%), African American (46%), and of Hispanic ethnicity (11%); 46% were privately insured. The association of net gestational weight gain with the child's BMI z score was significant after adjustment for prepregnancy maternal factors (P<.001); gestational diabetes mellitus, gestational hypertension, and tobacco use were not significant in adjusted models. Children of mothers with excess gestational weight gain had a higher mean BMI z score (P<.001) but a significant association was observed only for inadequate gestational weight gain after adjusting for prepregnancy BMI and other covariates. Prepregnancy BMI (P<.001), Hispanic ethnicity (P<.001), and being married (P<.05) were independently associated with increasing BMI z score of the offspring. CONCLUSIONS: Preconception maternal factors had a greater influence on child obesity than prenatal factors. The gestational weight gain category was independently related to BMI z score of 4 year olds, but this association was significant only for mothers with inadequate gestational weight gain. LEVEL OF EVIDENCE: II.


Subject(s)
Diabetes, Gestational/epidemiology , Hypertension, Pregnancy-Induced/epidemiology , Obesity/epidemiology , Pre-Eclampsia/epidemiology , Prenatal Exposure Delayed Effects/epidemiology , Adult , Black People/statistics & numerical data , Body Mass Index , Child, Preschool , Cohort Studies , Diabetes, Gestational/ethnology , Electronic Health Records , Female , Hispanic or Latino/statistics & numerical data , Humans , Hypertension, Pregnancy-Induced/ethnology , Male , Models, Biological , Obesity/ethnology , Pre-Eclampsia/ethnology , Pregnancy , Prevalence , Smoking/adverse effects , Smoking/epidemiology , Smoking/ethnology , Weight Gain , White People/statistics & numerical data , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...