Search | VHL Regional Portal

Gene function annotations for the maize NAM founder lines.

Fattel, Leila; Yanarella, Colleen F; Ngara, Blessing; Johnson, Olivia T; Campbell, Darwin A; Wimalanathan, Kokulapalan; Lawrence-Dill, Carolyn J.

BMC Res Notes ; 17(1): 9, 2024 Jan 02.

Article in English | MEDLINE | ID: mdl-38167110

ABSTRACT

OBJECTIVES: We annotated the latest published sequences of the 26 Zea mays Nested Association Mapping (NAM) founder lines using GOMAP, the Gene Ontology Meta Annotator for Plants. The maize NAM panel enables researchers to understand and identify the genetic basis of complex traits. Annotations of predicted functions for genes can help researchers investigate gene-phenotype associations, prioritize candidate genes for phenotypes of interest, and formulate testable hypotheses about gene function/phenotype associations. The creation and release of high-confidence, high-coverage gene function annotation sets for the NAM founder lines is critical to accelerate the generation of knowledge in maize genetics research. GOMAP is a high-throughput computational pipeline that annotates gene functions genome-wide in plant genomes using Gene Ontology functional class terms. Here we report and share GOMAP-generated functional annotations for the NAM founder lines. DATA DESCRIPTION: Datasets include the protein sequences used as input, GOMAP-generated annotation files, scripts used to update obsolete terms, and GAF-formatted tab-delimited text files of gene function annotations along with README files that describe formatting, content, and how files relate to each other.

Subject(s)

Genome, Plant , Zea mays , Zea mays/genetics , Genome, Plant/genetics , Phenotype

Wisconsin diversity panel phenotypes: spoken descriptions of plants and supporting data.

Yanarella, Colleen F; Fattel, Leila; Kristmundsdóttir, Ásrún Ý; Lopez, Miriam D; Edwards, Jode W; Campbell, Darwin A; Abel, Craig A; Lawrence-Dill, Carolyn J.

BMC Res Notes ; 17(1): 33, 2024 Jan 23.

Article in English | MEDLINE | ID: mdl-38263080

ABSTRACT

OBJECTIVES: Phenotyping plants in a field environment can involve a variety of methods including the use of automated instruments and labor-intensive manual measurement and scoring. Researchers also collect language-based phenotypic descriptions and use controlled vocabularies and structures such as ontologies to enable computation on descriptive phenotype data, including methods to determine phenotypic similarities. In this study, spoken descriptions of plants were collected and observers were instructed to use their own vocabulary to describe plant features that were present and visible. Further, these plants were measured and scored manually as part of a larger study to investigate whether spoken plant descriptions can be used to recover known biological phenomena. DATA DESCRIPTION: Data comprise phenotypic observations of 686 accessions of the maize Wisconsin Diversity panel, and 25 positive control accessions that carry visible, dramatic phenotypes. The data include the list of accessions planted, field layout, data collection procedures, student participants' (whose personal data are protected for ethical reasons) and volunteers' observation transcripts, volunteers' audio data files, terrestrial and aerial images of the plants, Amazon Web Services method selection experimental data, and manually collected phenotypes (e.g., plant height, ear and tassel features, etc.; measurements and scores). Data were collected during the summer of 2021 at Iowa State University's Agricultural Engineering and Agronomy Research Farms.

Subject(s)

Agriculture , Humans , Wisconsin , Data Collection , Farms , Phenotype

Standardized genome-wide function prediction enables comparative functional genomics: a new application area for Gene Ontologies in plants.

Fattel, Leila; Psaroudakis, Dennis; Yanarella, Colleen F; Chiteri, Kevin O; Dostalik, Haley A; Joshi, Parnal; Starr, Dollye C; Vu, Ha; Wimalanathan, Kokulapalan; Lawrence-Dill, Carolyn J.

Gigascience ; 112022 04 15.

Article in English | MEDLINE | ID: mdl-35426911

ABSTRACT

BACKGROUND: Genome-wide gene function annotations are useful for hypothesis generation and for prioritizing candidate genes potentially responsible for phenotypes of interest. We functionally annotated the genes of 18 crop plant genomes across 14 species using the GOMAP pipeline. RESULTS: By comparison to existing GO annotation datasets, GOMAP-generated datasets cover more genes, contain more GO terms, and are similar in quality (based on precision and recall metrics using existing gold standards as the basis for comparison). From there, we sought to determine whether the datasets across multiple species could be used together to carry out comparative functional genomics analyses in plants. To test the idea and as a proof of concept, we created dendrograms of functional relatedness based on terms assigned for all 18 genomes. These dendrograms were compared to well-established species-level evolutionary phylogenies to determine whether trees derived were in agreement with known evolutionary relationships, which they largely are. Where discrepancies were observed, we determined branch support based on jackknifing then removed individual annotation sets by genome to identify the annotation sets causing unexpected relationships. CONCLUSIONS: GOMAP-derived functional annotations used together across multiple species generally retain sufficient biological signal to recover known phylogenetic relationships based on genome-wide functional similarities, indicating that comparative functional genomics across species based on GO data holds promise for generating novel hypotheses about comparative gene function and traits.

Subject(s)

Genome, Plant , Genomics , Databases, Genetic , Gene Ontology , Molecular Sequence Annotation , Phylogeny , Plants/genetics

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement.

Braun, Ian R; Yanarella, Colleen F; Lawrence-Dill, Carolyn J.

Plant Phenomics ; 2020: 1963251, 2020.

Article in English | MEDLINE | ID: mdl-33313544

ABSTRACT

Many newly observed phenotypes are first described, then experimentally manipulated. These language-based descriptions appear in both the literature and in community datastores. To standardize phenotypic descriptions and enable simple data aggregation and analysis, controlled vocabularies and specific data architectures have been developed. Such simplified descriptions have several advantages over natural language: they can be rigorously defined for a particular context or problem, they can be assigned and interpreted programmatically, and they can be organized in a way that allows for semantic reasoning (inference of implicit facts). Because researchers generally report phenotypes in the literature using natural language, curators have been translating phenotypic descriptions into controlled vocabularies for decades to make the information computable. Unfortunately, this methodology is highly dependent on human curation, which does not scale to the scope of all publications available across all of plant biology. Simultaneously, researchers in other domains have been working to enable computation on natural language. This has resulted in new, automated methods for computing on language that are now available, with early analyses showing great promise. Natural language processing (NLP) coupled with machine learning (ML) allows for the use of unstructured language for direct analysis of phenotypic descriptions. Indeed, we have found that these automated methods can be used to create data structures that perform as well or better than those generated by human curators on tasks such as predicting gene function and biochemical pathway membership. Here, we describe current and ongoing efforts to provide tools for the plant phenomics community to explore novel predictions that can be generated using these techniques. We also describe how these methods could be used along with mobile speech-to-text tools to collect and analyze in-field spoken phenotypic descriptions for association genetics and breeding applications.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL