Search | VHL Regional Portal

1.

Biochemical pathways represented by Gene Ontology-Causal Activity Models identify distinct phenotypes resulting from mutations in pathways.

Hill, David P; Drabkin, Harold J; Smith, Cynthia L; Van Auken, Kimberly M; D'Eustachio, Peter.

Genetics ; 225(2)2023 Oct 04.

Article in English | MEDLINE | ID: mdl-37579192

ABSTRACT

Gene inactivation can affect the process(es) in which that gene acts and causally downstream ones, yielding diverse mutant phenotypes. Identifying the genetic pathways resulting in a given phenotype helps us understand how individual genes interact in a functional network. Computable representations of biological pathways include detailed process descriptions in the Reactome Knowledgebase and causal activity flows between molecular functions in Gene Ontology-Causal Activity Models (GO-CAMs). A computational process has been developed to convert Reactome pathways to GO-CAMs. Laboratory mice are widely used models of normal and pathological human processes. We have converted human Reactome GO-CAMs to orthologous mouse GO-CAMs, as a resource to transfer pathway knowledge between humans and model organisms. These mouse GO-CAMs allowed us to define sets of genes that function in a causally connected way. To demonstrate that individual variant genes from connected pathways result in similar but distinguishable phenotypes, we used the genes in our pathway models to cross-query mouse phenotype annotations in the Mouse Genome Database (MGD). Using GO-CAM representations of 2 related but distinct pathways, gluconeogenesis and glycolysis, we show that individual causal paths in gene networks give rise to discrete phenotypic outcomes resulting from perturbations of glycolytic and gluconeogenic genes. The accurate and detailed descriptions of gene interactions recovered in this analysis of well-studied processes suggest that this strategy can be applied to less well-understood processes in less well-studied model systems to predict phenotypic outcomes of novel gene variants and to identify potential gene targets in altered processes.

Subject(s)

Computational Biology , Databases, Genetic , Mice , Humans , Animals , Gene Ontology , Mutation , Phenotype , Computational Biology/methods

2.

Biochemical Pathways Represented by Gene Ontology Causal Activity Models Identify Distinct Phenotypes Resulting from Mutations in Pathways.

Hill, David P; Drabkin, Harold J; Smith, Cynthia L; Van Auken, Kimberly M; D'Eustachio, Peter.

bioRxiv ; 2023 Jul 13.

Article in English | MEDLINE | ID: mdl-37293039

ABSTRACT

Gene inactivation can affect the process(es) in which that gene acts and causally downstream ones, yielding diverse mutant phenotypes. Identifying the genetic pathways resulting in a given phenotype helps us understand how individual genes interact in a functional network. Computable representations of biological pathways include detailed process descriptions in the Reactome Knowledgebase, and causal activity flows between molecular functions in Gene Ontology-Causal Activity Models (GO-CAMs). A computational process has been developed to convert Reactome pathways to GO-CAMs. Laboratory mice are widely used models of normal and pathological human processes. We have converted human Reactome GO-CAMs to orthologous mouse GO-CAMs, as a resource to transfer pathway knowledge between humans and model organisms. These mouse GO-CAMs allowed us to define sets of genes that function in a connected and well-defined way. To test whether individual genes from well-defined pathways result in similar and distinguishable phenotypes, we used the genes in our pathway models to cross-query mouse phenotype annotations in the Mouse Genome Database (MGD). Using GO-CAM representations of two related but distinct pathways, gluconeogenesis and glycolysis, we can identify causal paths in gene networks that give rise to discrete phenotypic outcomes for perturbations of glycolysis and gluconeogenesis. The accurate and detailed descriptions of gene interactions recovered in this analysis of well-studied processes suggest that this strategy can be applied to less well-understood processes in less well-studied model systems to predict phenotypic outcomes of novel gene variants and to identify potential gene targets in altered processes.

3.

The Gene Ontology knowledgebase in 2023.

Aleksander, Suzi A; Balhoff, James; Carbon, Seth; Cherry, J Michael; Drabkin, Harold J; Ebert, Dustin; Feuermann, Marc; Gaudet, Pascale; Harris, Nomi L; Hill, David P; Lee, Raymond; Mi, Huaiyu; Moxon, Sierra; Mungall, Christopher J; Muruganugan, Anushya; Mushayahama, Tremayne; Sternberg, Paul W; Thomas, Paul D; Van Auken, Kimberly; Ramsey, Jolene; Siegele, Deborah A; Chisholm, Rex L; Fey, Petra; Aspromonte, Maria Cristina; Nugnes, Maria Victoria; Quaglia, Federica; Tosatto, Silvio; Giglio, Michelle; Nadendla, Suvarna; Antonazzo, Giulia; Attrill, Helen; Dos Santos, Gil; Marygold, Steven; Strelets, Victor; Tabone, Christopher J; Thurmond, Jim; Zhou, Pinglei; Ahmed, Saadullah H; Asanitthong, Praoparn; Luna Buitrago, Diana; Erdol, Meltem N; Gage, Matthew C; Ali Kadhum, Mohamed; Li, Kan Yan Chloe; Long, Miao; Michalak, Aleksandra; Pesala, Angeline; Pritazahra, Armalya; Saverimuttu, Shirin C C; Su, Renzhi.

Genetics ; 224(1)2023 05 04.

Article in English | MEDLINE | ID: mdl-36866529

ABSTRACT

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.

Subject(s)

Databases, Genetic , Proteins , Gene Ontology , Proteins/genetics , Molecular Sequence Annotation , Computational Biology

4.

Annotation of gene product function from high-throughput studies using the Gene Ontology.

Attrill, Helen; Gaudet, Pascale; Huntley, Rachael P; Lovering, Ruth C; Engel, Stacia R; Poux, Sylvain; Van Auken, Kimberly M; Georghiou, George; Chibucos, Marcus C; Berardini, Tanya Z; Wood, Valerie; Drabkin, Harold; Fey, Petra; Garmiri, Penelope; Harris, Midori A; Sawford, Tony; Reiser, Leonore; Tauber, Rebecca; Toro, Sabrina.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-30715275

ABSTRACT

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.

Subject(s)

Databases, Genetic , Gene Ontology , Genomics/methods , Molecular Sequence Annotation/methods , Animals , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA

5.

Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources.

Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R; Drabkin, Harold; Kochut, Krzysztof J; Ruan, Zheng; D'Eustachio, Peter; McSkimming, Daniel; Arighi, Cecilia; Chen, Chuming; Natale, Darren A; Smith, Cynthia; Gaudet, Pascale; Newton, Alexandra C; Wu, Cathy; Kannan, Natarajan.

Sci Rep ; 8(1): 6518, 2018 04 25.

Article in English | MEDLINE | ID: mdl-29695735

ABSTRACT

Many bioinformatics resources with unique perspectives on the protein landscape are currently available. However, generating new knowledge from these resources requires interoperable workflows that support cross-resource queries. In this study, we employ federated queries linking information from the Protein Kinase Ontology, iPTMnet, Protein Ontology, neXtProt, and the Mouse Genome Informatics to identify key knowledge gaps in the functional coverage of the human kinome and prioritize understudied kinases, cancer variants and post-translational modifications (PTMs) for functional studies. We identify 32 functional domains enriched in cancer variants and PTMs and generate mechanistic hypotheses on overlapping variant and PTM sites by aggregating information at the residue, protein, pathway and species level from these resources. We experimentally test the hypothesis that S768 phosphorylation in the C-helix of EGFR is inhibitory by showing that oncogenic variants altering S768 phosphorylation increase basal EGFR activity. In contrast, oncogenic variants altering conserved phosphorylation sites in the 'hydrophobic motif' of PKCßII (S660F and S660C) are loss-of-function in that they reduce kinase activity and enhance membrane translocation. Our studies provide a framework for integrative, consistent, and reproducible annotation of the cancer kinomes.

Subject(s)

Mutation/genetics , Neoplasms/genetics , Protein Kinases/genetics , Protein Processing, Post-Translational/genetics , Proteins/genetics , Animals , CHO Cells , COS Cells , Cell Line , Chlorocebus aethiops , Computational Biology/methods , Cricetulus , Gene Ontology , Genetic Variation/genetics , Humans , Mice , Phosphorylation/genetics

6.

Tutorial on Protein Ontology Resources.

Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R; Ross, Karen E; Natale, Darren A.

Methods Mol Biol ; 1558: 57-78, 2017.

Article in English | MEDLINE | ID: mdl-28150233

ABSTRACT

The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.

Subject(s)

Biological Ontologies , Computational Biology/methods , Databases, Genetic , Proteins/genetics , Proteins/metabolism , Software , Web Browser , Animals , Humans , Molecular Sequence Annotation , Proteins/chemistry , User-Computer Interface

7.

Protein Ontology (PRO): enhancing and scaling up the representation of protein entities.

Natale, Darren A; Arighi, Cecilia N; Blake, Judith A; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R; Cowart, Julie; D'Eustachio, Peter; Diehl, Alexander D; Drabkin, Harold J; Duncan, William D; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.

Nucleic Acids Res ; 45(D1): D339-D346, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899649

ABSTRACT

The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.

Subject(s)

Computational Biology/methods , Databases, Genetic , Proteins , Animals , Humans , Proteins/chemistry , Proteins/genetics , Web Browser

8.

Application of comparative biology in GO functional annotation: the mouse model.

Drabkin, Harold J; Christie, Karen R; Dolan, Mary E; Hill, David P; Ni, Li; Sitnikov, Dmitry; Blake, Judith A.

Mamm Genome ; 26(9-10): 574-83, 2015 Oct.

Article in English | MEDLINE | ID: mdl-26141960

ABSTRACT

The Gene Ontology (GO) is an important component of modern biological knowledge representation with great utility for computational analysis of genomic and genetic data. The Gene Ontology Consortium (GOC) consists of a large team of contributors including curation teams from most model organism database groups as well as curation teams focused on representation of data relevant to specific human diseases. Key to the generation of consistent and comprehensive annotations is the development and use of shared standards and measures of curation quality. The GOC engages all contributors to work to a defined standard of curation that is presented here in the context of annotation of genes in the laboratory mouse. Comprehensive understanding of the origin, epistemology, and coverage of GO annotations is essential for most effective use of GO resources. Here the application of comparative approaches to capturing functional data in the mouse system is described.

Subject(s)

Databases, Genetic , Gene Ontology , Molecular Sequence Annotation , Animals , Computational Biology , Genomics , Humans , Mice , Sequence Analysis, DNA

9.

DFLAT: functional annotation for human development.

Wick, Heather C; Drabkin, Harold; Ngu, Huy; Sackman, Michael; Fournier, Craig; Haggett, Jessica; Blake, Judith A; Bianchi, Diana W; Slonim, Donna K.

BMC Bioinformatics ; 15: 45, 2014 Feb 07.

Article in English | MEDLINE | ID: mdl-24507166

ABSTRACT

BACKGROUND: Recent increases in genomic studies of the developing human fetus and neonate have led to a need for widespread characterization of the functional roles of genes at different developmental stages. The Gene Ontology (GO), a valuable and widely-used resource for characterizing gene function, offers perhaps the most suitable functional annotation system for this purpose. However, due in part to the difficulty of studying molecular genetic effects in humans, even the current collection of comprehensive GO annotations for human genes and gene products often lacks adequate developmental context for scientists wishing to study gene function in the human fetus. DESCRIPTION: The Developmental FunctionaL Annotation at Tufts (DFLAT) project aims to improve the quality of analyses of fetal gene expression and regulation by curating human fetal gene functions using both manual and semi-automated GO procedures. Eligible annotations are then contributed to the GO database and included in GO releases of human data. DFLAT has produced a considerable body of functional annotation that we demonstrate provides valuable information about developmental genomics. A collection of gene sets (genes implicated in the same function or biological process), made by combining existing GO annotations with the 13,344 new DFLAT annotations, is available for use in novel analyses. Gene set analyses of expression in several data sets, including amniotic fluid RNA from fetuses with trisomies 21 and 18, umbilical cord blood, and blood from newborns with bronchopulmonary dysplasia, were conducted both with and without the DFLAT annotation. CONCLUSIONS: Functional analysis of expression data using the DFLAT annotation increases the number of implicated gene sets, reflecting the DFLAT's improved representation of current knowledge. Blinded literature review supports the validity of newly significant findings obtained with the DFLAT annotations. Newly implicated significant gene sets also suggest specific hypotheses for future research. Overall, the DFLAT project contributes new functional annotation and gene sets likely to enhance our ability to interpret genomic studies of human fetal and neonatal development.

Subject(s)

Databases, Genetic , Fetal Development/genetics , Genomics/methods , Human Development , Molecular Sequence Annotation/methods , Amniotic Fluid , Fetal Diseases/genetics , Genes/genetics , Genes/physiology , Humans , Infant, Newborn , Vocabulary, Controlled

10.

Protein Ontology: a controlled structured network of protein entities.

Natale, Darren A; Arighi, Cecilia N; Blake, Judith A; Bult, Carol J; Christie, Karen R; Cowart, Julie; D'Eustachio, Peter; Diehl, Alexander D; Drabkin, Harold J; Helfer, Olivia; Huang, Hongzhan; Masci, Anna Maria; Ren, Jia; Roberts, Natalia V; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Yerramalla, Meher Shruti; Zhang, Jian; AlJanahi, Aisha; Çelen, Irem; Gan, Cynthia; Lv, Mengxi; Schuster-Lezell, Emily; Wu, Cathy H.

Nucleic Acids Res ; 42(Database issue): D415-21, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24270789

ABSTRACT

The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO's organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO's representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.

Subject(s)

Biological Ontologies , Databases, Protein , Proteins/classification , Animals , Humans , Internet , Mice , Proteins/chemistry

11.

The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments.

Roncaglia, Paola; Martone, Maryann E; Hill, David P; Berardini, Tanya Z; Foulger, Rebecca E; Imam, Fahim T; Drabkin, Harold; Mungall, Christopher J; Lomax, Jane.

J Biomed Semantics ; 4(1): 20, 2013 Oct 07.

Article in English | MEDLINE | ID: mdl-24093723

ABSTRACT

BACKGROUND: The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. DESCRIPTION: Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. CONCLUSIONS: In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.

12.

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane.

BMC Genomics ; 14: 513, 2013 Jul 29.

Article in English | MEDLINE | ID: mdl-23895341

ABSTRACT

BACKGROUND: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. RESULTS: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. CONCLUSIONS: The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

Subject(s)

Biology , Chemistry , Genes , Vocabulary, Controlled

13.

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy.

Database (Oxford) ; 2013: bas056, 2013.

Article in English | MEDLINE | ID: mdl-23327936

ABSTRACT

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (â¼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

Subject(s)

Data Mining , Education , Databases as Topic , Documentation , Humans , Software , Time Factors

14.

Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

Drabkin, Harold J; Blake, Judith A.

Database (Oxford) ; 2012: bas045, 2012.

Article in English | MEDLINE | ID: mdl-23110975

ABSTRACT

The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.

Subject(s)

Databases, Genetic , Genome/genetics , Informatics , Molecular Sequence Annotation/methods , Workflow , Access to Information , Animals , Genomics , Humans , Mice , Natural Language Processing , Quality Control

15.

A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

Tasan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P.

G3 (Bethesda) ; 2(2): 223-33, 2012 Feb.

Article in English | MEDLINE | ID: mdl-22384401

ABSTRACT

The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

16.

The representation of protein complexes in the Protein Ontology (PRO).

Bult, Carol J; Drabkin, Harold J; Evsikov, Alexei; Natale, Darren; Arighi, Cecilia; Roberts, Natalia; Ruttenberg, Alan; D'Eustachio, Peter; Smith, Barry; Blake, Judith A; Wu, Cathy.

BMC Bioinformatics ; 12: 371, 2011 Sep 19.

Article in English | MEDLINE | ID: mdl-21929785

ABSTRACT

BACKGROUND: Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes. DESCRIPTION: We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/. CONCLUSION: PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species.

Subject(s)

Databases, Protein , Multiprotein Complexes , Proteins/chemistry , Animals , Computational Biology , Humans , Internet , Multienzyme Complexes , Proteins/metabolism

17.

The Protein Ontology: a structured representation of protein forms and complexes.

Natale, Darren A; Arighi, Cecilia N; Barker, Winona C; Blake, Judith A; Bult, Carol J; Caudy, Michael; Drabkin, Harold J; D'Eustachio, Peter; Evsikov, Alexei V; Huang, Hongzhan; Nchoutmboube, Jules; Roberts, Natalia V; Smith, Barry; Zhang, Jian; Wu, Cathy H.

Nucleic Acids Res ; 39(Database issue): D539-45, 2011 Jan.

Article in English | MEDLINE | ID: mdl-20935045

ABSTRACT

The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.

Subject(s)

Databases, Protein , Proteins/classification , Animals , Escherichia coli Proteins/chemistry , Humans , Mice , Multiprotein Complexes/chemistry , Multiprotein Complexes/classification , Protein Isoforms/chemistry , Protein Isoforms/classification , Proteins/chemistry , Proteins/genetics , User-Computer Interface , Vocabulary, Controlled

18.

A MOD(ern) perspective on literature curation.

Hirschman, Jodi; Berardini, Tanya Z; Drabkin, Harold J; Howe, Doug.

Mol Genet Genomics ; 283(5): 415-25, 2010 May.

Article in English | MEDLINE | ID: mdl-20221640

ABSTRACT

Curation of biological data is a multi-faceted task whose goal is to create a structured, comprehensive, integrated, and accurate resource of current biological knowledge. These structured data facilitate the work of the scientific community by providing knowledge about genes or genomes and by generating validated connections between the data that yield new information and stimulate new research approaches. For the model organism databases (MODs), an important source of data is research publications. Every published paper containing experimental information about a particular model organism is a candidate for curation. All such papers are examined carefully by curators for relevant information. Here, four curators from different MODs describe the literature curation process and highlight approaches taken by the four MODs to address: (1) the decision process by which papers are selected, and (2) the identification and prioritization of the data contained in the paper. We will highlight some of the challenges that MOD biocurators face, and point to ways in which researchers and publishers can support the work of biocurators and the value of such support.

Subject(s)

Databases, Genetic , Models, Biological , Animals , Bibliographies as Topic , Genes , Internet , Statistics as Topic , Terminology as Topic

19.

TGF-beta signaling proteins and the Protein Ontology.

Arighi, Cecilia N; Liu, Hongfang; Natale, Darren A; Barker, Winona C; Drabkin, Harold; Blake, Judith A; Smith, Barry; Wu, Cathy H.

BMC Bioinformatics ; 10 Suppl 5: S3, 2009 May 06.

Article in English | MEDLINE | ID: mdl-19426460

ABSTRACT

BACKGROUND: The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO. RESULTS: PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications. CONCLUSION: PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.

Subject(s)

Information Storage and Retrieval/methods , Intracellular Signaling Peptides and Proteins/classification , Transforming Growth Factor beta/chemistry , Computational Biology/methods , Databases, Genetic , Databases, Protein , Humans , Intracellular Signaling Peptides and Proteins/genetics , Transforming Growth Factor beta/classification , User-Computer Interface

20.

Ontological visualization of protein-protein interactions.

Drabkin, Harold J; Hollenbeck, Christopher; Hill, David P; Blake, Judith A.

BMC Bioinformatics ; 6: 29, 2005 Feb 11.

Article in English | MEDLINE | ID: mdl-15707487

ABSTRACT

BACKGROUND: Cellular processes require the interaction of many proteins across several cellular compartments. Determining the collective network of such interactions is an important aspect of understanding the role and regulation of individual proteins. The Gene Ontology (GO) is used by model organism databases and other bioinformatics resources to provide functional annotation of proteins. The annotation process provides a mechanism to document the binding of one protein with another. We have constructed protein interaction networks for mouse proteins utilizing the information encoded in the GO annotations. The work reported here presents a methodology for integrating and visualizing information on protein-protein interactions. RESULTS: GO annotation at Mouse Genome Informatics (MGI) captures 1318 curated, documented interactions. These include 129 binary interactions and 125 interaction involving three or more gene products. Three networks involve over 30 partners, the largest involving 109 proteins. Several tools are available at MGI to visualize and analyze these data. CONCLUSIONS: Curators at the MGI database annotate protein-protein interaction data from experimental reports from the literature. Integration of these data with the other types of data curated at MGI places protein binding data into the larger context of mouse biology and facilitates the generation of new biological hypotheses based on physical interactions among gene products.

Subject(s)

Computational Biology/methods , Protein Interaction Mapping/methods , Animals , Binding Sites , Cloning, Molecular , Databases, Genetic , Databases, Protein , Genes , Genome , Genomics , Humans , Information Storage and Retrieval , Mice , Models, Theoretical , Molecular Biology/methods , Molecular Conformation , Phosphorylation , Protein Binding , Protein Folding , Proteins/chemistry , Proteomics , Software

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL