Search | VHL Regional Portal

1.

The Arabidopsis Information Resource in 2024.

Reiser, Leonore; Bakker, Erica; Subramaniam, Sabarinath; Chen, Xingguo; Sawant, Swapnil; Khosa, Kartik; Prithvi, Trilok; Berardini, Tanya Z.

Genetics ; 227(1)2024 05 07.

Article in English | MEDLINE | ID: mdl-38457127

ABSTRACT

Since 1999, The Arabidopsis Information Resource (www.arabidopsis.org) has been curating data about the Arabidopsis thaliana genome. Its primary focus is integrating experimental gene function information from the peer-reviewed literature and codifying it as controlled vocabulary annotations. Our goal is to produce a "gold standard" functional annotation set that reflects the current state of knowledge about the Arabidopsis genome. At the same time, the resource serves as a nexus for community-based collaborations aimed at improving data quality, access, and reuse. For the past decade, our work has been made possible by subscriptions from our global user base. This update covers our ongoing biocuration work, some of our modernization efforts that contribute to the first major infrastructure overhaul since 2011, the introduction of JBrowse2, and the resource's role in community activities such as organizing the structural reannotation of the genome. For gene function assessment, we used gene ontology annotations as a metric to evaluate: (1) what is currently known about Arabidopsis gene function and (2) the set of "unknown" genes. Currently, 74% of the proteome has been annotated to at least one gene ontology term. Of those loci, half have experimental support for at least one of the following aspects: molecular function, biological process, or cellular component. Our work sheds light on the genes for which we have not yet identified any published experimental data and have no functional annotation. Drawing attention to these unknown genes highlights knowledge gaps and potential sources of novel discoveries.

Subject(s)

Arabidopsis , Databases, Genetic , Molecular Sequence Annotation , Arabidopsis/genetics , Genome, Plant , Gene Ontology , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism

2.

Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium.

Clarke, Jennifer L; Cooper, Laurel D; Poelchau, Monica F; Berardini, Tanya Z; Elser, Justin; Farmer, Andrew D; Ficklin, Stephen; Kumari, Sunita; Laporte, Marie-Angélique; Nelson, Rex T; Sadohara, Rie; Selby, Peter; Thessen, Anne E; Whitehead, Brandon; Sen, Taner Z.

Database (Oxford) ; 20232023 11 15.

Article in English | MEDLINE | ID: mdl-37971715

ABSTRACT

Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.

Subject(s)

Data Management , Plant Breeding , Animals , Genomics/methods , Databases, Factual , Information Dissemination

3.

Using the Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes.

Reiser, Leonore; Subramaniam, Shabari; Zhang, Peifen; Berardini, Tanya.

Curr Protoc ; 2(10): e574, 2022 Oct.

Article in English | MEDLINE | ID: mdl-36200836

ABSTRACT

The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs, gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of web-based search and display tools. This article primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genomes. Additionally, we describe how members of the community can share data via JBrowse and the Generic Online Annotation Submission Tool (GOAT) in order to make their published research more accessible and visible. © 2022 Wiley Periodicals LLC. Basic Protocol 1: TAIR homepage, sitemap, and navigation Basic Protocol 2: Finding comprehensive information about Arabidopsis genes Basic Protocol 3: Using the Arabidopsis genome browser: JBrowse Basic Protocol 4: Using the Gene Ontology annotations for gene discovery and gene function analysis Basic Protocol 5: Using gene lists to download bulk datasets Basic Protocol 6: Using TAIR's analysis tools to find short sequences and motifs Basic Protocol 7: Using the TAIR generic online annotation tool (GOAT) to submit functional annotations for Arabidopsis (or any other species) genes Basic Protocol 8: Using PhyloGenes to visualize gene families and predict functions Basic Protocol 9: Using TAIR to browse Arabidopsis literature Basic Protocol 10: Using the synteny viewer to find and display syntenic regions from Arabidopsis and other plant species.

Subject(s)

Arabidopsis Proteins , Arabidopsis , Animals , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Databases, Genetic , Genetic Markers , Genome, Plant/genetics , Goats/genetics

4.

Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO).

Ramsey, Jolene; McIntosh, Brenley; Renfro, Daniel; Aleksander, Suzanne A; LaBonte, Sandra; Ross, Curtis; Zweifel, Adrienne E; Liles, Nathan; Farrar, Shabnam; Gill, Jason J; Erill, Ivan; Ades, Sarah; Berardini, Tanya Z; Bennett, Jennifer A; Brady, Siobhan; Britton, Robert; Carbon, Seth; Caruso, Steven M; Clements, Dave; Dalia, Ritu; Defelice, Meredith; Doyle, Erin L; Friedberg, Iddo; Gurney, Susan M R; Hughes, Lee; Johnson, Allison; Kowalski, Jason M; Li, Donghui; Lovering, Ruth C; Mans, Tamara L; McCarthy, Fiona; Moore, Sean D; Murphy, Rebecca; Paustian, Timothy D; Perdue, Sarah; Peterson, Celeste N; Prüß, Birgit M; Saha, Margaret S; Sheehy, Robert R; Tansey, John T; Temple, Louise; Thorman, Alexander William; Trevino, Saul; Vollmer, Amy Cheng; Walbot, Virginia; Willey, Joanne; Siegele, Deborah A; Hu, James C.

PLoS Comput Biol ; 17(10): e1009463, 2021 10.

Article in English | MEDLINE | ID: mdl-34710081

ABSTRACT

Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.

Subject(s)

Crowdsourcing/methods , Gene Ontology , Molecular Sequence Annotation/methods , Computational Biology , Databases, Genetic , Humans , Proteins/genetics , Proteins/physiology

5.

Araport Lives: An Updated Framework for Arabidopsis Bioinformatics.

Pasha, Asher; Subramaniam, Shabari; Cleary, Alan; Chen, Xingguo; Berardini, Tanya; Farmer, Andrew; Town, Christopher; Provart, Nicholas.

Plant Cell ; 32(9): 2683-2686, 2020 09.

Article in English | MEDLINE | ID: mdl-32699173

Subject(s)

Arabidopsis , Computational Biology , Databases, Factual , Arabidopsis/genetics , Databases, Factual/economics , Genetic Research/economics , Genome, Plant , Internet

6.

PhyloGenes: An online phylogenetics and functional genomics resource for plant gene function inference.

Zhang, Peifen; Berardini, Tanya Z; Ebert, Dustin; Li, Qian; Mi, Huaiyu; Muruganujan, Anushya; Prithvi, Trilok; Reiser, Leonore; Sawant, Swapnil; Thomas, Paul D; Huala, Eva.

Plant Direct ; 4(12): e00293, 2020 Dec.

Article in English | MEDLINE | ID: mdl-33392435

ABSTRACT

We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non-plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.

7.

Annotation of gene product function from high-throughput studies using the Gene Ontology.

Attrill, Helen; Gaudet, Pascale; Huntley, Rachael P; Lovering, Ruth C; Engel, Stacia R; Poux, Sylvain; Van Auken, Kimberly M; Georghiou, George; Chibucos, Marcus C; Berardini, Tanya Z; Wood, Valerie; Drabkin, Harold; Fey, Petra; Garmiri, Penelope; Harris, Midori A; Sawford, Tony; Reiser, Leonore; Tauber, Rebecca; Toro, Sabrina.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-30715275

ABSTRACT

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.

Subject(s)

Databases, Genetic , Gene Ontology , Genomics/methods , Molecular Sequence Annotation/methods , Animals , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA

8.

AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture.

Harper, Lisa; Campbell, Jacqueline; Cannon, Ethalinda K S; Jung, Sook; Poelchau, Monica; Walls, Ramona; Andorf, Carson; Arnaud, Elizabeth; Berardini, Tanya Z; Birkett, Clayton; Cannon, Steve; Carson, James; Condon, Bradford; Cooper, Laurel; Dunn, Nathan; Elsik, Christine G; Farmer, Andrew; Ficklin, Stephen P; Grant, David; Grau, Emily; Herndon, Nic; Hu, Zhi-Liang; Humann, Jodi; Jaiswal, Pankaj; Jonquet, Clement; Laporte, Marie-Angélique; Larmande, Pierre; Lazo, Gerard; McCarthy, Fiona; Menda, Naama; Mungall, Christopher J; Munoz-Torres, Monica C; Naithani, Sushma; Nelson, Rex; Nesdill, Daureen; Park, Carissa; Reecy, James; Reiser, Leonore; Sanderson, Lacey-Anne; Sen, Taner Z; Staton, Margaret; Subramaniam, Sabarinath; Tello-Ruiz, Marcela Karey; Unda, Victor; Unni, Deepak; Wang, Liya; Ware, Doreen; Wegrzyn, Jill; Williams, Jason; Woodhouse, Margaret.

Database (Oxford) ; 20182018 01 01.

Article in English | MEDLINE | ID: mdl-30239679

ABSTRACT

The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.

Subject(s)

Agriculture , Databases, Genetic , Genomics , Breeding , Gene Ontology , Metadata , Surveys and Questionnaires

9.

Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology.

Lovering, Ruth C; Roncaglia, Paola; Howe, Douglas G; Laulederkind, Stanley J F; Khodiyar, Varsha K; Berardini, Tanya Z; Tweedie, Susan; Foulger, Rebecca E; Osumi-Sutherland, David; Campbell, Nancy H; Huntley, Rachael P; Talmud, Philippa J; Blake, Judith A; Breckenridge, Ross; Riley, Paul R; Lambiase, Pier D; Elliott, Perry M; Clapp, Lucie; Tinker, Andrew; Hill, David P.

Circ Genom Precis Med ; 11(2): e001813, 2018 02.

Article in English | MEDLINE | ID: mdl-29440116

ABSTRACT

BACKGROUND: A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. METHODS AND RESULTS: In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. CONCLUSIONS: We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects.

Subject(s)

Gene Ontology , Heart Diseases , Proteomics , Computational Biology , Databases, Genetic , Heart , Heart Diseases/genetics , Humans , Molecular Sequence Annotation , Phenotype

10.

RNAcentral: a comprehensive database of non-coding RNA sequences.

Petrov, Anton I; Kay, Simon J E; Kalvari, Ioanna; Howe, Kevin L; Gray, Kristian A; Bruford, Elspeth A; Kersey, Paul J; Cochrane, Guy; Finn, Robert D; Bateman, Alex; Kozomara, Ana; Griffiths-Jones, Sam; Frankish, Adam; Zwieb, Christian W; Lau, Britney Y; Williams, Kelly P; Chan, Patricia P; Lowe, Todd M; Cannone, Jamie J; Gutell, Robin; Machnicka, Magdalena A; Bujnicki, Janusz M; Yoshihama, Maki; Kenmochi, Naoya; Chai, Benli; Cole, James R; Szymanski, Maciej; Karlowski, Wojciech M; Wood, Valerie; Huala, Eva; Berardini, Tanya Z; Zhao, Yi; Chen, Runsheng; Zhu, Weimin; Paraskevopoulou, Maria D; Vlachos, Ioannis S; Hatzigeorgiou, Artemis G; Ma, Lina; Zhang, Zhang; Puetz, Joern; Stadler, Peter F; McDonald, Daniel; Basu, Siddhartha; Fey, Petra; Engel, Stacia R; Cherry, J Michael; Volders, Pieter-Jan; Mestdagh, Pieter; Wower, Jacek; Clark, Michael B.

Nucleic Acids Res ; 45(D1): D128-D134, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27794554

ABSTRACT

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.

Subject(s)

Databases, Nucleic Acid , RNA, Untranslated/chemistry , Animals , Genomics , Humans , Nucleotides/chemistry , Sequence Analysis, RNA , Species Specificity

11.

Modeling biochemical pathways in the gene ontology.

Hill, David P; D'Eustachio, Peter; Berardini, Tanya Z; Mungall, Christopher J; Renedo, Nikolai; Blake, Judith A.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27589964

ABSTRACT

The concept of a biological pathway, an ordered sequence of molecular transformations, is used to collect and represent molecular knowledge for a broad span of organismal biology. Representations of biomedical pathways typically are rich but idiosyncratic presentations of organized knowledge about individual pathways. Meanwhile, biomedical ontologies and associated annotation files are powerful tools that organize molecular information in a logically rigorous form to support computational analysis. The Gene Ontology (GO), representing Molecular Functions, Biological Processes and Cellular Components, incorporates many aspects of biological pathways within its ontological representations. Here we present a methodology for extending and refining the classes in the GO for more comprehensive, consistent and integrated representation of pathways, leveraging knowledge embedded in current pathway representations such as those in the Reactome Knowledgebase and MetaCyc. With carbohydrate metabolic pathways as a use case, we discuss how our representation supports the integration of variant pathway classes into a unified ontological structure that can be used for data comparison and analysis.

Subject(s)

Carbohydrate Metabolism/physiology , Databases, Genetic , Gene Ontology , Models, Biological , Animals , Humans

12.

Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model.

Reiser, Leonore; Berardini, Tanya Z; Li, Donghui; Muller, Robert; Strait, Emily M; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-26989150

ABSTRACT

Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org.

Subject(s)

Arabidopsis/genetics , Data Curation/economics , Models, Theoretical , Research Support as Topic/economics , Databases, Genetic , Software

13.

The Arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome.

Berardini, Tanya Z; Reiser, Leonore; Li, Donghui; Mezheritsky, Yarik; Muller, Robert; Strait, Emily; Huala, Eva.

Genesis ; 53(8): 474-85, 2015 Aug.

Article in English | MEDLINE | ID: mdl-26201819

ABSTRACT

The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR's biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR's focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible.

Subject(s)

Arabidopsis/genetics , Data Curation/methods , Data Curation/standards , Databases, Genetic/standards , Genome, Plant , Alleles , Arabidopsis Proteins/genetics , Genetic Association Studies

14.

Arabidopsis database and stock resources.

Li, Donghui; Dreher, Kate; Knee, Emma; Brkljacic, Jelena; Grotewold, Erich; Berardini, Tanya Z; Lamesch, Philippe; Garcia-Hernandez, Margarita; Reiser, Leonore; Huala, Eva.

Methods Mol Biol ; 1062: 65-96, 2014.

Article in English | MEDLINE | ID: mdl-24057361

ABSTRACT

The volume of Arabidopsis information has increased enormously in recent years as a result of the sequencing of the reference genome and other large-scale functional genomics projects. Much of the data is stored in public databases, where data are organized, analyzed, and made freely accessible to the research community. These databases are resources that researchers can utilize for making predictions and developing testable hypotheses. The methods in this chapter describe ways to access and utilize Arabidopsis data and genomic resources found in databases and stock centers.

Subject(s)

Arabidopsis/genetics , Databases, Genetic , Amino Acid Sequence , Arabidopsis/metabolism , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/genetics , Data Mining , Gene Ontology , Genes, Plant , Metabolic Networks and Pathways , Molecular Sequence Annotation , Protein Conformation , Seeds/genetics

15.

TermGenie - a web-application for pattern-based ontology class generation.

Dietze, Heiko; Berardini, Tanya Z; Foulger, Rebecca E; Hill, David P; Lomax, Jane; Osumi-Sutherland, David; Roncaglia, Paola; Mungall, Christopher J.

J Biomed Semantics ; 5: 48, 2014.

Article in English | MEDLINE | ID: mdl-25937883

ABSTRACT

BACKGROUND: Biological ontologies are continually growing and improving from requests for new classes (terms) by biocurators. These ontology requests can frequently create bottlenecks in the biocuration process, as ontology developers struggle to keep up, while manually processing these requests and create classes. RESULTS: TermGenie allows biocurators to generate new classes based on formally specified design patterns or templates. The system is web-based and can be accessed by any authorized curator through a web browser. Automated rules and reasoning engines are used to ensure validity, uniqueness and relationship to pre-existing classes. In the last 4 years the Gene Ontology TermGenie generated 4715 new classes, about 51.4% of all new classes created. The immediate generation of permanent identifiers proved not to be an issue with only 70 (1.4%) obsoleted classes. CONCLUSION: TermGenie is a web-based class-generation system that complements traditional ontology development tools. All classes added through pre-defined templates are guaranteed to have OWL equivalence axioms that are used for automatic classification and in some cases inter-ontology linkage. At the same time, the system is simple and intuitive and can be used by most biocurators without extensive training.

16.

The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments.

Roncaglia, Paola; Martone, Maryann E; Hill, David P; Berardini, Tanya Z; Foulger, Rebecca E; Imam, Fahim T; Drabkin, Harold; Mungall, Christopher J; Lomax, Jane.

J Biomed Semantics ; 4(1): 20, 2013 Oct 07.

Article in English | MEDLINE | ID: mdl-24093723

ABSTRACT

BACKGROUND: The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. DESCRIPTION: Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. CONCLUSIONS: In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.

17.

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane.

BMC Genomics ; 14: 513, 2013 Jul 29.

Article in English | MEDLINE | ID: mdl-23895341

ABSTRACT

BACKGROUND: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. RESULTS: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. CONCLUSIONS: The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

Subject(s)

Biology , Chemistry , Genes , Vocabulary, Controlled

18.

The plant ontology as a tool for comparative plant anatomy and genomic analyses.

Cooper, Laurel; Walls, Ramona L; Elser, Justin; Gandolfo, Maria A; Stevenson, Dennis W; Smith, Barry; Preece, Justin; Athreya, Balaji; Mungall, Christopher J; Rensing, Stefan; Hiss, Manuel; Lang, Daniel; Reski, Ralf; Berardini, Tanya Z; Li, Donghui; Huala, Eva; Schaeffer, Mary; Menda, Naama; Arnaud, Elizabeth; Shrestha, Rosemary; Yamazaki, Yukiko; Jaiswal, Pankaj.

Plant Cell Physiol ; 54(2): e1, 2013 Feb.

Article in English | MEDLINE | ID: mdl-23220694

ABSTRACT

The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.

Subject(s)

Genome, Plant , Genomics/methods , Plants/anatomy & histology , Plants/genetics , Software , Alkyl and Aryl Transferases/genetics , Databases, Genetic , Flowers/genetics , Internet , Molecular Sequence Annotation , Multigene Family , Phenotype , Plant Leaves/anatomy & histology , Plant Proteins/genetics

19.

Building an efficient curation workflow for the Arabidopsis literature corpus.

Li, Donghui; Berardini, Tanya Z; Muller, Robert J; Huala, Eva.

Database (Oxford) ; 2012: bas047, 2012.

Article in English | MEDLINE | ID: mdl-23221298

ABSTRACT

TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org

Subject(s)

Arabidopsis/genetics , Data Mining/methods , Databases, Genetic , Periodicals as Topic , Workflow , Genes, Plant/genetics , Molecular Sequence Annotation , Phenotype , Polymorphism, Genetic , Search Engine , Vocabulary, Controlled

20.

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts.

Wei, Chih-Hsuan; Harris, Bethany R; Li, Donghui; Berardini, Tanya Z; Huala, Eva; Kao, Hung-Yu; Lu, Zhiyong.

Database (Oxford) ; 2012: bas041, 2012.

Article in English | MEDLINE | ID: mdl-23160414

ABSTRACT

Today's biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

Subject(s)

Abstracting and Indexing , Data Mining/methods , Databases, Factual , Genes , Periodicals as Topic , PubMed , User-Computer Interface , Feedback , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL