Search | VHL Regional Portal

1.

Modeling 0.6 million genes for the rational design of functional cis-regulatory variants and de novo design of cis-regulatory sequences.

Li, Tianyi; Xu, Hui; Teng, Shouzhen; Suo, Mingrui; Bahitwa, Revocatus; Xu, Mingchi; Qian, Yiheng; Ramstein, Guillaume P; Song, Baoxing; Buckler, Edward S; Wang, Hai.

Proc Natl Acad Sci U S A ; 121(26): e2319811121, 2024 Jun 25.

Article in English | MEDLINE | ID: mdl-38889146

ABSTRACT

Rational design of plant cis-regulatory DNA sequences without expert intervention or prior domain knowledge is still a daunting task. Here, we developed PhytoExpr, a deep learning framework capable of predicting both mRNA abundance and plant species using the proximal regulatory sequence as the sole input. PhytoExpr was trained over 17 species representative of major clades of the plant kingdom to enhance its generalizability. Via input perturbation, quantitative functional annotation of the input sequence was achieved at single-nucleotide resolution, revealing an abundance of predicted high-impact nucleotides in conserved noncoding sequences and transcription factor binding sites. Evaluation of maize HapMap3 single-nucleotide polymorphisms (SNPs) by PhytoExpr demonstrates an enrichment of predicted high-impact SNPs in cis-eQTL. Additionally, we provided two algorithms that harnessed the power of PhytoExpr in designing functional cis-regulatory variants, and de novo creation of species-specific cis-regulatory sequences through in silico evolution of random DNA sequences. Our model represents a general and robust approach for functional variant discovery in population genetics and rational design of regulatory sequences for genome editing and synthetic biology.

Subject(s)

Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid , Zea mays , Regulatory Sequences, Nucleic Acid/genetics , Zea mays/genetics , Quantitative Trait Loci , Algorithms , Gene Expression Regulation, Plant , Deep Learning , Plants/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Models, Genetic , Genes, Plant , Binding Sites/genetics

2.

Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding.

Wu, Yaoyao; Li, Dawei; Hu, Yong; Li, Hongbo; Ramstein, Guillaume P; Zhou, Shaoqun; Zhang, Xinyan; Bao, Zhigui; Zhang, Yu; Song, Baoxing; Zhou, Yao; Zhou, Yongfeng; Gagnon, Edeline; Särkinen, Tiina; Knapp, Sandra; Zhang, Chunzhi; Städler, Thomas; Buckler, Edward S; Huang, Sanwen.

Cell ; 186(11): 2313-2328.e15, 2023 05 25.

Article in English | MEDLINE | ID: mdl-37146612

ABSTRACT

Hybrid potato breeding will transform the crop from a clonally propagated tetraploid to a seed-reproducing diploid. Historical accumulation of deleterious mutations in potato genomes has hindered the development of elite inbred lines and hybrids. Utilizing a whole-genome phylogeny of 92 Solanaceae and its sister clade species, we employ an evolutionary strategy to identify deleterious mutations. The deep phylogeny reveals the genome-wide landscape of highly constrained sites, comprising â¼2.4% of the genome. Based on a diploid potato diversity panel, we infer 367,499 deleterious variants, of which 50% occur at non-coding and 15% at synonymous sites. Counterintuitively, diploid lines with relatively high homozygous deleterious burden can be better starting material for inbred-line development, despite showing less vigorous growth. Inclusion of inferred deleterious mutations increases genomic-prediction accuracy for yield by 24.7%. Our study generates insights into the genome-wide incidence and properties of deleterious mutations and their far-reaching consequences for breeding.

Subject(s)

Plant Breeding , Solanum tuberosum , Diploidy , Mutation , Phylogeny , Solanum tuberosum/genetics

3.

Elucidating the patterns of pleiotropy and its biological relevance in maize.

Khaipho-Burch, Merritt; Ferebee, Taylor; Giri, Anju; Ramstein, Guillaume; Monier, Brandon; Yi, Emily; Romay, M Cinta; Buckler, Edward S.

PLoS Genet ; 19(3): e1010664, 2023 03.

Article in English | MEDLINE | ID: mdl-36943844

ABSTRACT

Pleiotropy-when a single gene controls two or more seemingly unrelated traits-has been shown to impact genes with effects on flowering time, leaf architecture, and inflorescence morphology in maize. However, the genome-wide impact of biological pleiotropy across all maize phenotypes is largely unknown. Here, we investigate the extent to which biological pleiotropy impacts phenotypes within maize using GWAS summary statistics reanalyzed from previously published metabolite, field, and expression phenotypes across the Nested Association Mapping population and Goodman Association Panel. Through phenotypic saturation of 120,597 traits, we obtain over 480 million significant quantitative trait nucleotides. We estimate that only 1.56-32.3% of intervals show some degree of pleiotropy. We then assess the relationship between pleiotropy and various biological features such as gene expression, chromatin accessibility, sequence conservation, and enrichment for gene ontology terms. We find very little relationship between pleiotropy and these variables when compared to permuted pleiotropy. We hypothesize that biological pleiotropy of common alleles is not widespread in maize and is highly impacted by nuisance terms such as population structure and linkage disequilibrium. Natural selection on large standing natural variation in maize populations may target wide and large effect variants, leaving the prevalence of detectable pleiotropy relatively low.

Subject(s)

Genome-Wide Association Study , Zea mays , Chromosome Mapping , Zea mays/genetics , Phenotype , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genetic Pleiotropy

4.

Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize.

Ramstein, Guillaume P; Buckler, Edward S.

Genome Biol ; 23(1): 183, 2022 09 01.

Article in English | MEDLINE | ID: mdl-36050782

ABSTRACT

BACKGROUND: Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. RESULTS: Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. CONCLUSIONS: Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach-Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)-could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse ( https://doi.org/10.25739/hybz-2957 ).

Subject(s)

Genomics , Zea mays , Genome , Genomics/methods , Nucleotides , Phenotype , Polymorphism, Single Nucleotide , Zea mays/genetics

5.

Integrating GWAS and TWAS to elucidate the genetic architecture of maize leaf cuticular conductance.

Lin, Meng; Qiao, Pengfei; Matschi, Susanne; Vasquez, Miguel; Ramstein, Guillaume P; Bourgault, Richard; Mohammadi, Marc; Scanlon, Michael J; Molina, Isabel; Smith, Laurie G; Gore, Michael A.

Plant Physiol ; 189(4): 2144-2158, 2022 08 01.

Article in English | MEDLINE | ID: mdl-35512195

ABSTRACT

The cuticle, a hydrophobic layer of cutin and waxes synthesized by plant epidermal cells, is the major barrier to water loss when stomata are closed. Dissecting the genetic architecture of natural variation for maize (Zea mays L.) leaf cuticular conductance (gc) is important for identifying genes relevant to improving crop productivity in drought-prone environments. To this end, we performed an integrated genome- and transcriptome-wide association studies (GWAS and TWAS) to identify candidate genes putatively regulating variation in leaf gc. Of the 22 plausible candidate genes identified, 4 were predicted to be involved in cuticle precursor biosynthesis and export, 2 in cell wall modification, 9 in intracellular membrane trafficking, and 7 in the regulation of cuticle development. A gene encoding an INCREASED SALT TOLERANCE1-LIKE1 (ISTL1) protein putatively involved in intracellular protein and membrane trafficking was identified in GWAS and TWAS as the strongest candidate causal gene. A set of maize nested near-isogenic lines that harbor the ISTL1 genomic region from eight donor parents were evaluated for gc, confirming the association between gc and ISTL1 in a haplotype-based association analysis. The findings of this study provide insights into the role of regulatory variation in the development of the maize leaf cuticle and will ultimately assist breeders to develop drought-tolerant maize for target environments.

Subject(s)

Genome-Wide Association Study , Zea mays , Plant Leaves/metabolism , Transcriptome , Waxes/metabolism , Zea mays/metabolism

6.

Utilizing evolutionary conservation to detect deleterious mutations and improve genomic prediction in cassava.

Long, Evan M; Romay, M Cinta; Ramstein, Guillaume; Buckler, Edward S; Robbins, Kelly R.

Front Plant Sci ; 13: 1041925, 2022.

Article in English | MEDLINE | ID: mdl-37082510

ABSTRACT

Introduction: Cassava (Manihot esculenta) is an annual root crop which provides the major source of calories for over half a billion people around the world. Since its domestication ~10,000 years ago, cassava has been largely clonally propagated through stem cuttings. Minimal sexual recombination has led to an accumulation of deleterious mutations made evident by heavy inbreeding depression. Methods: To locate and characterize these deleterious mutations, and to measure selection pressure across the cassava genome, we aligned 52 related Euphorbiaceae and other related species representing millions of years of evolution. With single base-pair resolution of genetic conservation, we used protein structure models, amino acid impact, and evolutionary conservation across the Euphorbiaceae to estimate evolutionary constraint. With known deleterious mutations, we aimed to improve genomic evaluations of plant performance through genomic prediction. We first tested this hypothesis through simulation utilizing multi-kernel GBLUP to predict simulated phenotypes across separate populations of cassava. Results: Simulations showed a sizable increase of prediction accuracy when incorporating functional variants in the model when the trait was determined by<100 quantitative trait loci (QTL). Utilizing deleterious mutations and functional weights informed through evolutionary conservation, we saw improvements in genomic prediction accuracy that were dependent on trait and prediction. Conclusion: We showed the potential for using evolutionary information to track functional variation across the genome, in order to improve whole genome trait prediction. We anticipate that continued work to improve genotype accuracy and deleterious mutation assessment will lead to improved genomic assessments of cassava clones.

7.

Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize.

Giri, Anju; Khaipho-Burch, Merritt; Buckler, Edward S; Ramstein, Guillaume P.

PLoS Genet ; 17(10): e1009568, 2021 10.

Article in English | MEDLINE | ID: mdl-34606492

ABSTRACT

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels-a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)-for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Subject(s)

Haplotypes/genetics , Quantitative Trait Loci/genetics , RNA/genetics , Zea mays/genetics , Alleles , Chromosome Mapping/methods , Genome-Wide Association Study/methods , Genomics/methods , Genotype , Linkage Disequilibrium/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics

8.

Predicting phenotypes from genetic, environment, management, and historical data using CNNs.

Washburn, Jacob D; Cimen, Emre; Ramstein, Guillaume; Reeves, Timothy; O'Briant, Patrick; McLean, Greg; Cooper, Mark; Hammer, Graeme; Buckler, Edward S.

Theor Appl Genet ; 134(12): 3997-4011, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34448888

ABSTRACT

KEY MESSAGE: Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.

Subject(s)

Crops, Agricultural/genetics , Genomics/methods , Neural Networks, Computer , Phenotype , Crops, Agricultural/growth & development , Data Mining , Machine Learning , Zea mays/growth & development

9.

High-resolution genome-wide association study pinpoints metal transporter and chelator genes involved in the genetic control of element levels in maize grain.

Wu, Di; Tanaka, Ryokei; Li, Xiaowei; Ramstein, Guillaume P; Cu, Suong; Hamilton, John P; Buell, C Robin; Stangoulis, James; Rocheford, Torbert; Gore, Michael A.

G3 (Bethesda) ; 11(4)2021 04 15.

Article in English | MEDLINE | ID: mdl-33677522

ABSTRACT

Despite its importance to plant function and human health, the genetics underpinning element levels in maize grain remain largely unknown. Through a genome-wide association study in the maize Ames panel of nearly 2,000 inbred lines that was imputed with â¼7.7 million SNP markers, we investigated the genetic basis of natural variation for the concentration of 11 elements in grain. Novel associations were detected for the metal transporter genes rte2 (rotten ear2) and irt1 (iron-regulated transporter1) with boron and nickel, respectively. We also further resolved loci that were previously found to be associated with one or more of five elements (copper, iron, manganese, molybdenum, and/or zinc), with two metal chelator and five metal transporter candidate causal genes identified. The nas5 (nicotianamine synthase5) gene involved in the synthesis of nicotianamine, a metal chelator, was found associated with both zinc and iron and suggests a common genetic basis controlling the accumulation of these two metals in the grain. Furthermore, moderate predictive abilities were obtained for the 11 elemental grain phenotypes with two whole-genome prediction models: Bayesian Ridge Regression (0.33-0.51) and BayesB (0.33-0.53). Of the two models, BayesB, with its greater emphasis on large-effect loci, showed â¼4-10% higher predictive abilities for nickel, molybdenum, and copper. Altogether, our findings contribute to an improved genotype-phenotype map for grain element accumulation in maize.

Subject(s)

Genome-Wide Association Study , Zea mays , Bayes Theorem , Chelating Agents , Edible Grain/genetics , Humans , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Zea mays/genetics

10.

A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction.

Jensen, Sarah E; Charles, Jean Rigaud; Muleta, Kebede; Bradbury, Peter J; Casstevens, Terry; Deshpande, Santosh P; Gore, Michael A; Gupta, Rajeev; Ilut, Daniel C; Johnson, Lynn; Lozano, Roberto; Miller, Zachary; Ramu, Punna; Rathore, Abhishek; Romay, M Cinta; Upadhyaya, Hari D; Varshney, Rajeev K; Morris, Geoffrey P; Pressoir, Gael; Buckler, Edward S; Ramstein, Guillaume P.

Plant Genome ; 13(1): e20009, 2020 03.

Article in English | MEDLINE | ID: mdl-33016627

ABSTRACT

Successful management and utilization of increasingly large genomic datasets is essential for breeding programs to accelerate cultivar development. To help with this, we developed a Sorghum bicolor Practical Haplotype Graph (PHG) pangenome database that stores haplotypes and variant information. We developed two PHGs in sorghum that were used to identify genome-wide variants for 24 founders of the Chibas sorghum breeding program from 0.01x sequence coverage. The PHG called single nucleotide polymorphisms (SNPs) with 5.9% error at 0.01x coverage-only 3% higher than PHG error when calling SNPs from 8x coverage sequence. Additionally, 207 progenies from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes were imputed from PHG parental haplotypes and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from .57-.73 and are similar to prediction accuracies obtained with genotyping-by-sequencing or targeted amplicon sequencing (rhAmpSeq) markers. This study demonstrates the use of a sorghum PHG to impute SNPs from low-coverage sequence data and shows that the PHG can unify genotype calls across multiple sequencing platforms. By reducing input sequence requirements, the PHG can decrease the cost of genotyping, make GS more feasible, and facilitate larger breeding populations. Our results demonstrate that the PHG is a useful research and breeding tool that maintains variant information from a diverse group of taxa, stores sequence data in a condensed but readily accessible format, unifies genotypes across genotyping platforms, and provides a cost-effective option for genomic selection.

Subject(s)

Sorghum , Cost-Benefit Analysis , Genome , Genomics , Haplotypes , Sorghum/genetics

11.

Dominance Effects and Functional Enrichments Improve Prediction of Agronomic Traits in Hybrid Maize.

Ramstein, Guillaume P; Larsson, Sara J; Cook, Jason P; Edwards, Jode W; Ersoz, Elhan S; Flint-Garcia, Sherry; Gardner, Candice A; Holland, James B; Lorenz, Aaron J; McMullen, Michael D; Millard, Mark J; Rocheford, Torbert R; Tuinstra, Mitchell R; Bradbury, Peter J; Buckler, Edward S; Romay, M Cinta.

Genetics ; 215(1): 215-230, 2020 05.

Article in English | MEDLINE | ID: mdl-32152047

ABSTRACT

Single-cross hybrids have been critical to the improvement of maize (Zea mays L.), but the characterization of their genetic architectures remains challenging. Previous studies of hybrid maize have shown the contribution of within-locus complementation effects (dominance) and their differential importance across functional classes of loci. However, they have generally considered panels of limited genetic diversity, and have shown little benefit from genomic prediction based on dominance or functional enrichments. This study investigates the relevance of dominance and functional classes of variants in genomic models for agronomic traits in diverse populations of hybrid maize. We based our analyses on a diverse panel of inbred lines crossed with two testers representative of the major heterotic groups in the U.S. (1106 hybrids), as well as a collection of 24 biparental populations crossed with a single tester (1640 hybrids). We investigated three agronomic traits: days to silking (DTS), plant height (PH), and grain yield (GY). Our results point to the presence of dominance for all traits, but also among-locus complementation (epistasis) for DTS and genotype-by-environment interactions for GY. Consistently, dominance improved genomic prediction for PH only. In addition, we assessed enrichment of genetic effects in classes defined by genic regions (gene annotation), structural features (recombination rate and chromatin openness), and evolutionary features (minor allele frequency and evolutionary constraint). We found support for enrichment in genic regions and subsequent improvement of genomic prediction for all traits. Our results suggest that dominance and gene annotations improve genomic prediction across diverse populations in hybrid maize.

Subject(s)

Edible Grain/genetics , Genes, Dominant , Hybridization, Genetic , Models, Genetic , Plant Breeding/methods , Quantitative Trait, Heritable , Zea mays/genetics , Edible Grain/growth & development , Epistasis, Genetic , Evolution, Molecular , Gene-Environment Interaction , Zea mays/growth & development

12.

Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.

Washburn, Jacob D; Mejia-Guerra, Maria Katherine; Ramstein, Guillaume; Kremling, Karl A; Valluru, Ravi; Buckler, Edward S; Wang, Hai.

Proc Natl Acad Sci U S A ; 116(12): 5542-5549, 2019 03 19.

Article in English | MEDLINE | ID: mdl-30842277

ABSTRACT

Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (i) gene-family-guided splitting and (ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3' UTR is more important for fine-tuning mRNA abundance levels while the 5' UTR is more important for large-scale changes.

Subject(s)

Base Sequence/genetics , Deep Learning , Evolution, Molecular , Transcription, Genetic/genetics , DNA/genetics , DNA/metabolism , Gene Expression Regulation/genetics , Models, Theoretical , Sequence Analysis, DNA

13.

Extensions of BLUP Models for Genomic Prediction in Heterogeneous Populations: Application in a Diverse Switchgrass Sample.

Ramstein, Guillaume P; Casler, Michael D.

G3 (Bethesda) ; 9(3): 789-805, 2019 03 07.

Article in English | MEDLINE | ID: mdl-30651285

ABSTRACT

Genomic prediction is a useful tool to accelerate genetic gain in selection using DNA marker information. However, this technology typically relies on standard prediction procedures, such as genomic BLUP, that are not designed to accommodate population heterogeneity resulting from differences in marker effects across populations. In this study, we assayed different prediction procedures to capture marker-by-population interactions in genomic prediction models. Prediction procedures included genomic BLUP and two kernel-based extensions of genomic BLUP which explicitly accounted for population heterogeneity. To model population heterogeneity, dissemblance between populations was either depicted by a unique coefficient (as previously reported), or a more flexible function of genetic distance between populations (proposed herein). Models under investigation were applied in a diverse switchgrass sample under two validation schemes: whole-sample calibration, where all individuals except selection candidates are included in the calibration set, and cross-population calibration, where the target population is entirely excluded from the calibration set. First, we showed that using fixed effects, from principal components or putative population groups, appeared detrimental to prediction accuracy, especially in cross-population calibration. Then we showed that modeling population heterogeneity by our proposed procedure resulted in highly significant improvements in model fit. In such cases, gains in accuracy were often positive. These results suggest that population heterogeneity may be parsimoniously captured by kernel methods. However, in cases where improvement in model fit by our proposed procedure is null-to-moderate, ignoring heterogeneity should probably be preferred due to the robustness and simplicity of the standard genomic BLUP model.

Subject(s)

Genetic Association Studies , Genetics, Population/methods , Models, Genetic , Panicum/genetics , Polymorphism, Single Nucleotide , Genome, Plant , Genomics/methods , Plant Breeding

14.

Breaking the curse of dimensionality to identify causal variants in Breeding 4.

Ramstein, Guillaume P; Jensen, Sarah E; Buckler, Edward S.

Theor Appl Genet ; 132(3): 559-567, 2019 Mar.

Article in English | MEDLINE | ID: mdl-30547185

ABSTRACT

In the past, plant breeding has undergone three major transformations and is currently transitioning to a new technological phase, Breeding 4. This phase is characterized by the development of methods for biological design of plant varieties, including transformation and gene editing techniques directed toward causal loci. The application of such technologies will require to reliably estimate the effect of loci in plant genomes by avoiding the situation where the number of loci assayed (p) surpasses the number of plant genotypes (n). Here, we discuss approaches to avoid this curse of dimensionality (n âª p), which will involve analyzing intermediate phenotypes such as molecular traits and component traits related to plant morphology or physiology. Because these approaches will rely on novel data types such as DNA sequences and high-throughput phenotyping images, Breeding 4 will call for analyses that are complementary to traditional quantitative genetic studies, being based on machine learning techniques which make efficient use of sequence and image data. In this article, we will present some of these techniques and their application for prioritizing causal loci and developing improved varieties in Breeding 4.

Subject(s)

Genetic Variation , Plant Breeding/methods , Base Sequence , Machine Learning , Quantitative Trait, Heritable , Statistics as Topic

15.

Candidate Variants for Additive and Interactive Effects on Bioenergy Traits in Switchgrass (Panicum virgatum L.) Identified by Genome-Wide Association Analyses.

Ramstein, Guillaume P; Evans, Joseph; Nandety, Aruna; Saha, Malay C; Brummer, E Charles; Kaeppler, Shawn M; Buell, C Robin; Casler, Michael D.

Plant Genome ; 11(3)2018 11.

Article in English | MEDLINE | ID: mdl-30512032

ABSTRACT

Switchgrass ( L.) is a promising herbaceous energy crop, but further gains in biomass yield and quality must be achieved to enable a viable bioenergy industry. Developing DNA markers can contribute to such progress, but depiction of genetic bases should be reliable, involving simple additive marker effects and also interactions with genetic backgrounds (e.g., ecotypes) or synergies with other markers. We analyzed plant height, C content, N content, and mineral concentration in a diverse panel consisting of 512 genotypes of upland and lowland ecotypes. We performed association analyses based on exome capture sequencing and tested 439,170 markers for marginal effects, 83,290 markers for marker × ecotype interactions, and up to 311,445 marker pairs for pairwise interactions. Analyses of pairwise interactions focused on subsets of marker pairs preselected on the basis of marginal marker effects, gene ontology annotation, and pairwise marker associations. Our tests identified 12 significant effects. Homology and gene expression information corroborated seven effects and indicated plausible causal pathways: flowering time and lignin synthesis for plant height; plant growth and senescence for C content and mineral concentration. Four pairwise interactions were detected, including three interactions preselected on the basis of pairwise marker correlations. Furthermore, a marker × ecotype interaction and a pairwise interaction were confirmed in an independent switchgrass panel. Our analyses identified reliable candidate variants for important bioenergy traits. Moreover, they exemplified the importance of interactive effects for depicting genetic bases and illustrated the usefulness of preselecting marker pairs for identifying pairwise marker interactions in association studies.

Subject(s)

Genes, Plant , Genetic Variation , Panicum/genetics , Biofuels , Genetic Markers , Genome-Wide Association Study , Panicum/metabolism , Phenotype

16.

Genome-Wide Association Study in Pseudo-F₂ Populations of Switchgrass Identifies Genetic Loci Affecting Heading and Anthesis Dates.

Taylor, Megan; Tornqvist, Carl-Erik; Zhao, Xiongwei; Grabowski, Paul; Doerge, Rebecca; Ma, Jianxin; Volenec, Jeffrey; Evans, Joseph; Ramstein, Guillaume P; Sanciangco, Millicent D; Buell, C Robin; Casler, Michael D; Jiang, Yiwei.

Front Plant Sci ; 9: 1250, 2018.

Article in English | MEDLINE | ID: mdl-30271414

ABSTRACT

Switchgrass (Panicum virgatum) is a native prairie grass and valuable bio-energy crop. The physiological change from juvenile to reproductive adult can draw important resources away from growth into producing reproductive structures, thereby limiting the growth potential of early flowering plants. Delaying the flowering of switchgrass is one approach by which to increase total biomass. The objective of this research was to identify genetic variants and candidate genes for controlling heading and anthesis in segregating switchgrass populations. Four pseudo-F2 populations (two pairs of reciprocal crosses) were developed from lowland (late flowering) and upland (early flowering) ecotypes, and heading and anthesis dates of these populations were collected in Lafayette, IN and DeKalb, IL in 2015 and 2016. Across 2 years, there was a 34- and 73-day difference in heading and a 52- and 75-day difference in anthesis at the Lafayette and DeKalb locations, respectively. A total of 37,901 single nucleotide polymorphisms obtained by exome capture sequencing of the populations were used in a genome-wide association study (GWAS) that identified five significant signals at three loci for heading and two loci for anthesis. Among them, a homolog of FLOWERING LOCUS T on chromosome 5b associated with heading date was identified at the Lafayette location across 2 years. A homolog of ARABIDOPSIS PSEUDO-RESPONSE REGULATOR 5, a light modulator in the circadian clock associated with heading date was detected on chromosome 8a across locations and years. These results demonstrate that genetic variants related to floral development could lend themselves to a long-term goal of developing late flowering varieties of switchgrass with high biomass yield.

17.

Phylogeny, biogeography and character evolution in the tribe Desmodieae (Fabaceae: Papilionoideae), with special emphasis on the New Caledonian endemic genera.

Jabbour, Florian; Gaudeul, Myriam; Lambourdière, Josie; Ramstein, Guillaume; Hassanin, Alexandre; Labat, Jean-Noël; Sarthou, Corinne.

Mol Phylogenet Evol ; 118: 108-121, 2018 01.

Article in English | MEDLINE | ID: mdl-28966123

ABSTRACT

The nearly cosmopolitan tribe Desmodieae (Fabaceae) includes many important genera for medicine and forage. However, the phylogenetic relationships among the infratribal groups circumscribed using morphological traits are still poorly known. In this study, we used chloroplast (rbcL, psbA-trnH) and nuclear (ITS-1) DNA sequences to investigate the molecular phylogeny and historical biogeography of Desmodieae, and infer ancestral states for several vegetative and reproductive traits. Three groups, corresponding to the Desmodium, Lespedeza, and Phyllodium groups sensu Ohashi were retrieved in the phylogenetic analyses. Conflicts in the topologies inferred from the chloroplast and nuclear datasets were detected. For instance, the Lespedeza clade was sister to the groups Phyllodium+Desmodium based on chloroplast DNA, but nested within the Desmodium group based on ITS-1. Moreover, the New Caledonian endemic genera Arthroclianthus and Nephrodesmus were not monophyletic but together formed a clade, which also included Hanslia and Ohwia based on chloroplast DNA. The hypothetical common ancestor of Desmodieae was dated to the Middle Oligocene (ca. 28.3Ma) and was likely an Asian shrub or tree producing indehiscent loments. Several colonization events towards Oceania, America, and Africa occurred (all less than ca. 17.5Ma), most probably through long distance dispersal. The fruits of Desmodieae repeatedly evolved from indehiscence to dehiscence. We also showed that indehiscent loments allow for more variability in the number of seeds per fruit than indehiscent legumes. Modularity seems here to allow variability in the number of ovules produced in a single ovary.

Subject(s)

Fabaceae/classification , Phylogeny , Phylogeography , Bayes Theorem , DNA, Chloroplast/genetics , Ecosystem , Fabaceae/genetics , Fruit/anatomy & histology , New Caledonia , Phenotype , Seeds/anatomy & histology , Species Specificity , Time Factors

18.

Genome-wide associations with flowering time in switchgrass using exome-capture sequencing data.

Grabowski, Paul P; Evans, Joseph; Daum, Chris; Deshpande, Shweta; Barry, Kerrie W; Kennedy, Megan; Ramstein, Guillaume; Kaeppler, Shawn M; Buell, C Robin; Jiang, Yiwei; Casler, Michael D.

New Phytol ; 213(1): 154-169, 2017 01.

Article in English | MEDLINE | ID: mdl-27443672

ABSTRACT

Flowering time is a major determinant of biomass yield in switchgrass (Panicum virgatum), a perennial bioenergy crop, because later flowering allows for an extended period of vegetative growth and increased biomass production. A better understanding of the genetic regulation of flowering time in switchgrass will aid the development of switchgrass varieties with increased biomass yields, particularly at northern latitudes, where late-flowering but southern-adapted varieties have high winter mortality. We use genotypes derived from recently published exome-capture sequencing, which mitigates challenges related to the large, highly repetitive and polyploid switchgrass genome, to perform genome-wide association studies (GWAS) using flowering time data from a switchgrass association panel in an effort to characterize the genetic architecture and genes underlying flowering time regulation in switchgrass. We identify associations with flowering time at multiple loci, including in a homolog of FLOWERING LOCUS T and in a locus containing TIMELESS, a homolog of a key circadian regulator in animals. Our results suggest that flowering time variation in switchgrass is due to variation at many positions across the genome. The relationship of flowering time and geographic origin indicates likely roles for genes in the photoperiod and autonomous pathways in generating switchgrass flowering time variation.

Subject(s)

Exome Sequencing/methods , Exome/genetics , Flowers/genetics , Flowers/physiology , Genome-Wide Association Study , Panicum/genetics , Alleles , Genes, Plant , Genetic Association Studies , Genetic Variation , Genotype , Geography , Linkage Disequilibrium/genetics , Phenotype , Seasons , Temperature , Time Factors

19.

Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L.) Improved by Accounting for Linkage Disequilibrium.

Ramstein, Guillaume P; Evans, Joseph; Kaeppler, Shawn M; Mitchell, Robert B; Vogel, Kenneth P; Buell, C Robin; Casler, Michael D.

G3 (Bethesda) ; 6(4): 1049-62, 2016 04 07.

Article in English | MEDLINE | ID: mdl-26869619

ABSTRACT

Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height, and heading date. Marker data were produced for the families' parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs.

Subject(s)

Genetic Linkage , Genome, Plant , Genomics , Linkage Disequilibrium , Panicum/genetics , Algorithms , Alleles , Gene Frequency , Genetic Variation , Genomics/methods , Models, Genetic , Phenotype , Quantitative Trait, Heritable , Reproducibility of Results

20.

Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D.

G3 (Bethesda) ; 5(5): 891-909, 2015 Mar 12.

Article in English | MEDLINE | ID: mdl-25770100

ABSTRACT

Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.

Subject(s)

Genome, Plant , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Quantitative Trait, Heritable , Algorithms , Biofuels , Models, Theoretical

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL