Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Hum Mutat ; 27(9): 957-64, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16865690

ABSTRACT

The proliferation of biomedical literature makes it increasingly difficult for researchers to find and manage relevant information. However, identifying research articles containing mutation data, a requisite first step in integrating large and complex mutation data sets, is currently tedious, time-consuming and imprecise. More effective mechanisms for identifying articles containing mutation information would be beneficial both for the curation of mutation databases and for individual researchers. We developed an automated method that uses information extraction, classifier, and relevance ranking techniques to determine the likelihood of MEDLINE abstracts containing information regarding genomic variation data suitable for inclusion in mutation databases. We targeted the CDKN2A (p16) gene and the procedure for document identification currently used by CDKN2A Database curators as a measure of feasibility. A set of abstracts was manually identified from a MEDLINE search as potentially containing specific CDKN2A mutation events. A subset of these abstracts was used as a training set for a maximum entropy classifier to identify text features distinguishing "relevant" from "not relevant" abstracts. Each document was represented as a set of indicative word, word pair, and entity tagger-derived genomic variation features. When applied to a test set of 200 candidate abstracts, the classifier predicted 88 articles as being relevant; of these, 29 of 32 manuscripts in which manual curation found CDKN2A sequence variants were positively predicted. Thus, the set of potentially useful articles that a manual curator would have to review was reduced by 56%, maintaining 91% recall (sensitivity) and more than doubling precision (positive predictive value). Subsequent expansion of the training set to 494 articles yielded similar precision and recall rates, and comparison of the original and expanded trials demonstrated that the average precision improved with the larger data set. Our results show that automated systems can effectively identify article subsets relevant to a given task and may prove to be powerful tools for the broader research community. This procedure can be readily adapted to any or all genes, organisms, or sets of documents.


Subject(s)
Genes, Neoplasm , Genes, p16 , Information Storage and Retrieval/methods , MEDLINE , Mutation , Computational Biology/methods , Databases, Genetic , Neoplasms/genetics , Periodicals as Topic
2.
Hum Mutat ; 24(4): 296-304, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15365986

ABSTRACT

In this report, we introduce the CDKN2A Database, an online database of germline and somatic variants of the CDKN2A tumor suppressor gene recorded in human disease through the year 2002, annotated with evolutionary, structural, and functional information. The CDKN2A Database improves upon existing resources by: 1) including both somatic mutations and germline variants, thereby adding the perspective of somatic cell carcinogenesis to that of hereditary cancer predisposition; 2) including information that assists with the interpretation of allelic variants, such as other primary data (sequences, structures, alignments, functional measurements, and literature references) and annotations (extensive text, figures, and a tree-based phylogenetic classification); and 3) providing the information in a format that allows a user to either download the database or to easily manipulate it online. We describe the database structure, content, current uses, and potential implications (http://biodesktop.uvm.edu/perl/p16).


Subject(s)
Cyclin-Dependent Kinase Inhibitor p16/physiology , Databases, Genetic , Genes, p16 , Mutation , Alleles , Cell Cycle/genetics , Cell Cycle/physiology , Cell Transformation, Neoplastic/genetics , Cyclin-Dependent Kinase Inhibitor p16/chemistry , Cyclin-Dependent Kinase Inhibitor p16/deficiency , Evolution, Molecular , Forecasting , Genes, p53 , Germ-Line Mutation , Humans , Neoplasms/genetics , Structure-Activity Relationship , Tumor Suppressor Protein p14ARF/chemistry , Tumor Suppressor Protein p14ARF/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...