Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
Add more filters










Publication year range
1.
Plant Genome ; 16(1): e20263, 2023 03.
Article in English | MEDLINE | ID: mdl-36484148

ABSTRACT

Soybean [Glycine max (L.) Merr.] is a significant source of protein and oil and is also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein, and oil content is important to feed the ever-growing population. As opposed to high-cost phenotyping, genotyping is both cost and time efficient for breeders because evaluating new lines in different environments (location-year combinations) can be costly. Several genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (genomic best linear unbiased predictor [GBLUP]), a kernel method (Gaussian kernel [GK]), an artificial-intelligence (AI) method (deep learning [DL]), and a hybrid method that corresponds to the emulation of a DL model using a kernel method (an arc-cosine kernel [AK]) in terms of their prediction accuracies for predicting grain yield, oil, and protein using data from the soybean nested association mapping experiment (1,379 genotypes tested in six environments, all genotypes in all environments). The relative performance of the four methods varied with the response variable and whether the model includes the genotype × environmental interaction (G×E) effects or not. The GBLUP consistently showed better performances, whereas GK and AK followed a similar pattern to GBLUP and DL performed slightly worse than the other three methods in most of the cases; however, this may also be attributed to suboptimal hyperparameters. The DL method performed particularly worse than the other three methods in presence of the G×E effects.


Subject(s)
Gene-Environment Interaction , Glycine max , Animals , Glycine max/genetics , Genome, Plant , Models, Genetic , Triticum/genetics , Intelligence
2.
J Exp Bot ; 74(1): 352-363, 2023 01 01.
Article in English | MEDLINE | ID: mdl-36242765

ABSTRACT

Ontogenic changes in soybean radiation use efficiency (RUE) have been attributed to variation in specific leaf nitrogen (SLN) based only on data collected during seed filling. We evaluated this hypothesis using data on leaf area, absorbed radiation (ARAD), aboveground dry matter (ADM), and plant nitrogen (N) concentration collected during the entire crop season from seven field experiments conducted in a stress-free environment. Each experiment included a full-N treatment that received ample N fertilizer and a zero-N treatment that relied on N fixation and soil N mineralization. We estimated RUE based on changes in ADM between sampling times and associated ARAD, accounting for changes in biomass composition. The RUE and SLN exhibited different seasonal patterns: a bell-shaped pattern with a peak around the beginning of seed filling, and a convex pattern followed by an abrupt decline during late seed filling, respectively. Changes in SLN explained the decline in RUE during seed filling but failed to predict changes in RUE in earlier stages and underestimated the maximum RUE observed during pod setting. Comparison between observed and simulated RUE using a process-based crop simulation model revealed similar discrepancies. The decoupling between RUE and SLN during early crop stages suggests that leaf N is above that needed to maximize crop growth but may play a role in storing N that can be used in later reproductive stages to meet the large seed N demand associated with high-yielding crops.


Subject(s)
Glycine max , Nitrogen , Biomass , Seeds , Crops, Agricultural
3.
Plants (Basel) ; 11(21)2022 Oct 28.
Article in English | MEDLINE | ID: mdl-36365358

ABSTRACT

The likelihood of success in developing modern cultivars depend on multiple factors, including the identification of suitable parents to initiate new crosses, and characterizations of genomic regions associated with target traits. The objectives of the present study were to (a) determine the best economic weights of four major wheat diseases (leaf spot, common bunt, leaf rust, and stripe rust) and grain yield for multi-trait restrictive linear phenotypic selection index (RLPSI), (b) select the top 10% cultivars and lines (hereafter referred as genotypes) with better resistance to combinations of the four diseases and acceptable grain yield as potential parents, and (c) map genomic regions associated with resistance to each disease using genome-wide association study (GWAS). A diversity panel of 196 spring wheat genotypes was evaluated for their reaction to stripe rust at eight environments, leaf rust at four environments, leaf spot at three environments, common bunt at two environments, and grain yield at five environments. The panel was genotyped with the Wheat 90K SNP array and a few KASP SNPs of which we used 23,342 markers for statistical analyses. The RLPSI analysis performed by restricting the expected genetic gain for yield displayed significant (p < 0.05) differences among the 3125 economic weights. Using the best four economic weights, a subset of 22 of the 196 genotypes were selected as potential parents with resistance to the four diseases and acceptable grain yield. GWAS identified 37 genomic regions, which included 12 for common bunt, 13 for leaf rust, 5 for stripe rust, and 7 for leaf spot. Each genomic region explained from 6.6 to 16.9% and together accounted for 39.4% of the stripe rust, 49.1% of the leaf spot, 94.0% of the leaf rust, and 97.9% of the common bunt phenotypic variance combined across all environments. Results from this study provide valuable information for wheat breeders selecting parental combinations for new crosses to develop improved germplasm with enhanced resistance to the four diseases as well as the physical positions of genomic regions that confer resistance, which facilitates direct comparisons for independent mapping studies in the future.

4.
Front Genet ; 13: 958780, 2022.
Article in English | MEDLINE | ID: mdl-36313472

ABSTRACT

The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology have yielded high-dimensional genomic marker data which can be difficult to incorporate into statistical models. In this paper, we investigated the utility of applying dimensionality reduction (DR) methods as a pre-processing step for GS methods. We compared five DR methods and studied the trend in the prediction accuracies of each method as a function of the number of features retained. The effect of DR methods was studied using three models that involved the main effects of line, environment, marker, and the genotype by environment interactions. The methods were applied on a real data set containing 315 lines phenotyped in nine environments with 26,817 markers each. Regardless of the DR method and prediction model used, only a fraction of features was sufficient to achieve maximum correlation. Our results underline the usefulness of DR methods as a key pre-processing step in GS models to improve computational efficiency in the face of ever-increasing size of genomic data.

5.
Plants (Basel) ; 11(14)2022 Jul 20.
Article in English | MEDLINE | ID: mdl-35890521

ABSTRACT

Both the Linear Phenotypic Selection Index (LPSI) and the Restrictive Linear Phenotypic Selection Index (RLPSI) have been widely used to select parents and progenies, but the effect of economic weights on the selection parameters (the expected genetic gain, response to selection, and the correlation between the indices and genetic merits) have not been investigated in detail. Here, we (i) assessed combinations of 2304 economic weights using four traits (maturity, plant height, grain yield and grain protein content) recorded under four organically (low nitrogen) and five conventionally (high nitrogen) managed environments, (ii) compared single-trait and multi-trait selection indices (LPSI vs. RLPSI by imposing restrictions to the expected genetic gain of either yield or grain protein content), and (iii) selected a subset of about 10% spring wheat cultivars that performed very well under organic and/or conventional management systems. The multi-trait selection indices, with and without imposing restrictions, were superior to single trait selection. However, the selection parameters differed quite a lot depending on the economic weights, which suggests the need for optimizing the weights. Twenty-two of the 196 cultivars that showed superior performance under organic and/or conventional management systems were consistently selected using all five of the selected economic weights, and at least two of the selection scenarios. The selected cultivars belonged to the Canada Western Red Spring (16 cultivars), the Canada Northern Hard Red (3), and the Canada Prairie Spring Red (3), and required 83-93 days to maturity, were 72-100 cm tall, and produced from 4.0 to 6.2 t ha-1 grain yield with 14.6-17.7% GPC. The selected cultivars would be highly useful, not only as potential trait donors for breeding under an organic management system, but also for other studies, including nitrogen use efficiency.

6.
Plants (Basel) ; 11(13)2022 Jun 30.
Article in English | MEDLINE | ID: mdl-35807690

ABSTRACT

Some previous studies have assessed the predictive ability of genome-wide selection on stripe (yellow) rust resistance in wheat, but the effect of genotype by environment interaction (GEI) in prediction accuracies has not been well studied in diverse genetic backgrounds. Here, we compared the predictive ability of a model based on phenotypic data only (M1), the main effect of phenotype and molecular markers (M2), and a model that incorporated GEI (M3) using three cross-validations (CV1, CV2, and CV0) scenarios of interest to breeders in six spring wheat populations. Each population was evaluated at three to eight field nurseries and genotyped with either the DArTseq technology or the wheat 90K single nucleotide polymorphism arrays, of which a subset of 1,058- 23,795 polymorphic markers were used for the analyses. In the CV1 scenario, the mean prediction accuracies of the M1, M2, and M3 models across the six populations varied from -0.11 to -0.07, from 0.22 to 0.49, and from 0.19 to 0.48, respectively. Mean accuracies obtained using the M3 model in the CV1 scenario were significantly greater than the M2 model in two populations, the same in three populations, and smaller in one population. In both the CV2 and CV0 scenarios, the mean prediction accuracies of the three models varied from 0.53 to 0.84 and were not significantly different in all populations, except the Attila/CDC Go in the CV2, where the M3 model gave greater accuracy than both the M1 and M2 models. Overall, the M3 model increased prediction accuracies in some populations by up to 12.4% and decreased accuracy in others by up to 17.4%, demonstrating inconsistent results among genetic backgrounds that require considering each population separately. This is the first comprehensive genome-wide prediction study that investigated details of the effect of GEI on stripe rust resistance across diverse spring wheat populations.

7.
Genes (Basel) ; 13(4)2022 03 23.
Article in English | MEDLINE | ID: mdl-35456370

ABSTRACT

Some studies have investigated the potential of genomic selection (GS) on stripe rust, leaf rust, Fusarium head blight (FHB), and leaf spot in wheat, but none of them have assessed the effect of the reaction norm model that incorporated GE interactions. In addition, the prediction accuracy on common bunt has not previously been studied. Here, we investigated within-population prediction accuracies using the baseline M1 model and two reaction norm models (M2 and M3) with three random cross-validation (CV1, CV2, and CV0) schemes. Three Canadian spring wheat populations were evaluated in up to eight field environments and genotyped with 3158, 5732, and 23,795 polymorphic markers. The M3 model that incorporated GE interactions reduced residual variance by an average of 10.2% as compared with the main effect M2 model and increased prediction accuracies on average by 2-6%. In some traits, the M3 model increased prediction accuracies up to 54% as compared with the M2 model. The average prediction accuracies of the M3 model with CV1, CV2, and CV0 schemes varied from 0.02 to 0.48, from 0.25 to 0.84, and from 0.14 to 0.87, respectively. In both CV2 and CV0 schemes, stripe rust in all three populations, common bunt and leaf rust in two populations, as well as FHB severity, FHB index, and leaf spot in one population had high to very high (0.54-0.87) prediction accuracies. This is the first comprehensive genomic selection study on five major diseases in spring wheat.


Subject(s)
Basidiomycota , Fusarium , Basidiomycota/genetics , Canada , Disease Resistance/genetics , Fusarium/genetics , Plant Diseases/genetics , Triticum/genetics
8.
Methods Mol Biol ; 2467: 139-156, 2022.
Article in English | MEDLINE | ID: mdl-35451775

ABSTRACT

Genomic selection (GS) is a methodology that revolutionized the process of breeding improved genetic materials in plant and animal breeding programs. It uses predicted genomic values of the potential of untested/unobserved genotypes as surrogates of phenotypes during the selection process. Such that the predicted genomic values are obtained using exclusively the marker profiles of the untested genotypes, and these potentially can be used by breeders for screening the genotypes to be advanced in the breeding pipeline, to identify potential parents for next improvement cycles, or to find optimal crosses for targeting genotypes among others. Conceptually, GS initially requires a set of genotypes with both molecular marker information and phenotypic data for model calibration and then the performance of untested genotypes is predicted using their marker profiles only. Hence, it is expected that breeders would look at these values in order to conduct selections. Even though the concept of GS seems trivial, due to the high dimensional nature of the data delivered from modern sequencing technologies where the number of molecular markers (p) excess by far the number of data points available for model fitting (n; p â‰« n) a complete renovated set of prediction models was needed to cope with this challenge. In this chapter, we provide a conceptual framework for comparing statistical models to overcome the "large p, small n problem." Given the very large diversity of GS models only the most popular are presented here; mainly we focused on linear regression-based models and nonparametric models that predict the genetic estimated breeding values (GEBV) in a single environment considering a single trait only, mainly in the context of plant breeding.


Subject(s)
Models, Genetic , Selection, Genetic , Animals , Genome , Genomics/methods , Genotype , Phenotype
10.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042796

ABSTRACT

Quantitative understanding of factors driving yield increases of major food crops is essential for effective prioritization of research and development. Yet previous estimates had limitations in distinguishing among contributing factors such as changing climate and new agronomic and genetic technologies. Here, we distinguished the separate contribution of these factors to yield advance using an extensive database collected from the largest irrigated maize-production domain in the world located in Nebraska (United States) during the 2005-to-2018 period. We found that 48% of the yield gain was associated with a decadal climate trend, 39% with agronomic improvements, and, by difference, only 13% with improvement in genetic yield potential. The fact that these findings were so different from most previous studies, which gave much-greater weight to genetic yield potential improvement, gives urgency to the need to reevaluate contributions to yield advances for all major food crops to help guide future investments in research and development to achieve sustainable global food security. If genetic progress in yield potential is also slowing in other environments and crops, future crop-yield gains will increasingly rely on improved agronomic practices.


Subject(s)
Agriculture/methods , Zea mays/growth & development , Zea mays/genetics , Climate , Climate Change , Crops, Agricultural/growth & development , Soil/chemistry , Soil Microbiology
11.
Theor Appl Genet ; 135(2): 537-552, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34724078

ABSTRACT

KEY MESSAGE: Using phenotype data of three spring wheat populations evaluated at 6-15 environments under two management systems, we found moderate to very high prediction accuracies across seven traits. The phenotype data collected under an organic management system effectively predicted the performance of lines in the conventional management and vice versa. There is growing interest in developing wheat cultivars specifically for organic agriculture, but we are not aware of the effect of organic management on the predictive ability of genomic selection (GS). Here, we evaluated within populations prediction accuracies of four GS models, four combinations of training and testing sets, three reaction norm models, and three random cross-validations (CV) schemes in three populations phenotyped under organic and conventional management systems. Our study was based on a total of 578 recombinant inbred lines and varieties from three spring wheat populations, which were evaluated for seven traits at 3-9 conventionally and 3-6 organically managed field environments and genotyped either with the wheat 90 K SNP array or DArTseq. We predicted the management systems (CV0M) or environments (CV0), a subset of lines that have been evaluated in either management (CV2M) or some environments (CV2), and the performance of newly developed lines in either management (CV1M) or environments (CV1). The average prediction accuracies of the model that incorporated genotype × environment interactions with CV0 and CV2 schemes varied from 0.69 to 0.97. In the CV1 and CV1M schemes, prediction accuracies ranged from - 0.12 to 0.77 depending on the reaction norm models, the traits, and populations. In most cases, grain protein showed the highest prediction accuracies. The phenotype data collected under the organic management effectively predicted the performance of lines under conventional management and vice versa. This is the first comprehensive GS study that investigated the effect of the organic management system in wheat.


Subject(s)
Genomics , Triticum , Genome, Plant , Genotype , Phenotype , Triticum/genetics
12.
Front Genet ; 13: 1032691, 2022.
Article in English | MEDLINE | ID: mdl-37065625

ABSTRACT

Modern plant breeding programs collect several data types such as weather, images, and secondary or associated traits besides the main trait (e.g., grain yield). Genomic data is high-dimensional and often over-crowds smaller data types when naively combined to explain the response variable. There is a need to develop methods able to effectively combine different data types of differing sizes to improve predictions. Additionally, in the face of changing climate conditions, there is a need to develop methods able to effectively combine weather information with genotype data to predict the performance of lines better. In this work, we develop a novel three-stage classifier to predict multi-class traits by combining three data types-genomic, weather, and secondary trait. The method addressed various challenges in this problem, such as confounding, differing sizes of data types, and threshold optimization. The method was examined in different settings, including binary and multi-class responses, various penalization schemes, and class balances. Then, our method was compared to standard machine learning methods such as random forests and support vector machines using various classification accuracy metrics and using model size to evaluate the sparsity of the model. The results showed that our method performed similarly to or better than machine learning methods across various settings. More importantly, the classifiers obtained were highly sparse, allowing for a straightforward interpretation of relationships between the response and the selected predictors.

13.
Nature ; 599(7886): 622-627, 2021 11.
Article in English | MEDLINE | ID: mdl-34759320

ABSTRACT

Zero hunger and good health could be realized by 2030 through effective conservation, characterization and utilization of germplasm resources1. So far, few chickpea (Cicer arietinum) germplasm accessions have been characterized at the genome sequence level2. Here we present a detailed map of variation in 3,171 cultivated and 195 wild accessions to provide publicly available resources for chickpea genomics research and breeding. We constructed a chickpea pan-genome to describe genomic diversity across cultivated chickpea and its wild progenitor accessions. A divergence tree using genes present in around 80% of individuals in one species allowed us to estimate the divergence of Cicer over the last 21 million years. Our analysis found chromosomal segments and genes that show signatures of selection during domestication, migration and improvement. The chromosomal locations of deleterious mutations responsible for limited genetic diversity and decreased fitness were identified in elite germplasm. We identified superior haplotypes for improvement-related traits in landraces that can be introgressed into elite breeding lines through haplotype-based breeding, and found targets for purging deleterious alleles through genomics-assisted breeding and/or gene editing. Finally, we propose three crop breeding strategies based on genomic prediction to enhance crop productivity for 16 traits while avoiding the erosion of genetic diversity through optimal contribution selection (OCS)-based pre-breeding. The predicted performance for 100-seed weight, an important yield-related trait, increased by up to 23% and 12% with OCS- and haplotype-based genomic approaches, respectively.


Subject(s)
Cicer/genetics , Genetic Variation , Genome, Plant/genetics , Sequence Analysis, DNA , Crops, Agricultural/genetics , Haplotypes/genetics , Plant Breeding , Polymorphism, Single Nucleotide/genetics
14.
Plant Genome ; 14(3): e20151, 2021 11.
Article in English | MEDLINE | ID: mdl-34510790

ABSTRACT

Sparse testing in genome-enabled prediction in plant breeding can be emulated throughout different line allocations where some lines are observed in all environments (overlap) and others are observed in only one environment (nonoverlap). We studied three general cases of the composition of the sparse testing allocation design for genome-enabled prediction of wheat (Triticum aestivum L.) breeding: (a) completely nonoverlapping wheat lines in environments, (b) completely overlapping wheat lines in all environments, and (c) a proportion of nonoverlapping/overlapping wheat lines allocated in the environments. We also studied several cases in which the size of the testing population was systematically decreased. The study used three extensive wheat data sets (W1, W2, and W3). Three different genome-enabled prediction models (M1-M3) were used to study the effect of the sparse testing in terms of the genomic prediction accuracy. Model M1 included only main effects of environments and lines; M2 included main effects of environments, lines, and genomic effects; whereas the remaining model (M3) also incorporated the genomic × environment interaction (GE). The results show that the GE component of the genome-based model M3 captures a larger genetic variability than the main genomic effects term from models M1 and M2. In addition, model M3 provides higher prediction accuracy than models M1 and M2 for the same allocation designs (different combinations of nonoverlapping/overlapping lines in environments and training set sizes). Overlapped sets of 30-50 lines in all the environments provided stable genomic-enabled prediction accuracy. Reducing the size of the testing populations under all allocation designs decreases the prediction accuracy, which recovers when more lines are tested in all environments. Model M3 offers the possibility of maintaining the prediction accuracy throughout both extreme situations of all nonoverlapping lines and all overlapping lines.


Subject(s)
Plant Breeding , Triticum , Gene-Environment Interaction , Genotype , Models, Genetic , Phenotype , Triticum/genetics
15.
Front Plant Sci ; 12: 630175, 2021.
Article in English | MEDLINE | ID: mdl-33868333

ABSTRACT

Identifying genetic loci associated with yield stability has helped plant breeders and geneticists begin to understand the role and influence of genotype by environment (GxE) interactions in soybean [Glycine max (L.) Merr.] productivity, as well as other crops. Quantifying a genotype's range of performance across testing locations has been developed over decades with dozens of methodologies available. This includes directly modeling GxE interactions as part of an overall model for yield, as well as methods which generate overall yield "stability" values from multi-environment trial data. Correspondence between these methods as it pertains to the outcomes of genome wide association studies (GWAS) has not been well defined. In this study, the GWAS results for yield and yield stability were compared in 213 soybean lines across 11 environments to determine their utility and potential intersection. Both univariate and multivariate conventional stability estimates were considered alongside a mixed model for yield that fit marker by environment interactions as a random effect. One-hundred and six total QTL were discovered across all mapping results, however, genetic loci that were significant in the mixed model for grain yield that fit marker by environment interactions were completely distinct from those that were significant when mapping using traditional stability measures as a phenotype. Furthermore, 73.21% of QTL discovered in the mixed model were determined to cause a crossover interaction effect which cause genotype rank changes between environments. Overall, the QTL discovered via explicitly mapping GxE interactions also explained more yield variance that those QTL associated with differences in traditional stability estimates making their theoretical impact on selection greater. A lack of intersecting results between mapping approaches highlights the importance of examining stability in multiple contexts when attempting to manipulate GxE interactions in soybean.

16.
PLoS One ; 15(12): e0243408, 2020.
Article in English | MEDLINE | ID: mdl-33296417

ABSTRACT

We study a novel multi-strain SIR epidemic model with selective immunity by vaccination. A newer strain is made to emerge in the population when a preexisting strain has reached equilbrium. We assume that this newer strain does not exhibit cross-immunity with the original strain, hence those who are vaccinated and recovered from the original strain become susceptible to the newer strain. Recent events involving the COVID-19 virus shows that it is possible for a viral strain to emerge from a population at a time when the influenza virus, a well-known virus with a vaccine readily available, is active in a population. We solved for four different equilibrium points and investigated the conditions for existence and local stability. The reproduction number was also determined for the epidemiological model and found to be consistent with the local stability condition for the disease-free equilibrium.


Subject(s)
COVID-19/epidemiology , Epidemics , Models, Biological , SARS-CoV-2 , COVID-19/prevention & control , Humans , Influenza Vaccines/therapeutic use , Influenza, Human/epidemiology , Influenza, Human/prevention & control
17.
G3 (Bethesda) ; 10(8): 2725-2739, 2020 08 05.
Article in English | MEDLINE | ID: mdl-32527748

ABSTRACT

"Sparse testing" refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including GE. The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3 being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.


Subject(s)
Gene-Environment Interaction , Plant Breeding , Genomics , Genotype , Models, Genetic , Phenotype
18.
Plant Methods ; 15: 123, 2019.
Article in English | MEDLINE | ID: mdl-31695728

ABSTRACT

BACKGROUND: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. RESULTS: Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58-0.81) was higher than the other lines (r = 0.21-0.59) included in this study with different genetic backgrounds. CONCLUSIONS: With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing.

19.
G3 (Bethesda) ; 9(9): 2925-2934, 2019 09 04.
Article in English | MEDLINE | ID: mdl-31300481

ABSTRACT

Genome-enabled prediction plays an essential role in wheat breeding because it has the potential to increase the rate of genetic gain relative to traditional phenotypic and pedigree-based selection. Since the performance of wheat lines is highly influenced by environmental stimuli, it is important to accurately model the environment and its interaction with genetic factors in prediction models. Arguably, multi-environmental best linear unbiased prediction (BLUP) may deliver better prediction performance than single-environment genomic BLUP. We evaluated pedigree and genome-based prediction using 35,403 wheat lines from the Global Wheat Breeding Program of the International Maize and Wheat Improvement Center (CIMMYT). We implemented eight statistical models that included genome-wide molecular marker and pedigree information as prediction inputs in two different validation schemes. All models included main effects, but some considered interactions between the different types of pedigree and genomic covariates via Hadamard products of similarity kernels. Pedigree models always gave better prediction of new lines in observed environments than genome-based models when only main effects were fitted. However, for all traits, the highest predictive abilities were obtained when interactions between pedigree, genomes, and environments were included. When new lines were predicted in unobserved environments, in almost all trait/year combinations, the marker main-effects model was the best. These results provide strong evidence that the different sources of genetic information (molecular markers and pedigree) are not equally useful at different stages of the breeding pipelines, and can be employed differentially to improve the design and prediction of the outcome of future breeding programs.


Subject(s)
Genome, Plant , Models, Genetic , Triticum/physiology , Gene-Environment Interaction , Genetic Markers , Phenotype , Plant Breeding , Random Allocation , Reproducibility of Results , Triticum/genetics
20.
Evol Bioinform Online ; 15: 1176934319840026, 2019.
Article in English | MEDLINE | ID: mdl-30956524

ABSTRACT

Prediction techniques are important in plant breeding as they provide a tool for selection that is more efficient and economical than traditional phenotypic and pedigree based selection. The conventional genomic prediction models include molecular marker information to predict the phenotype. With the development of new phenomics techniques we have the opportunity to collect image data on the plants, and extend the traditional genomic prediction models where we incorporate diverse set of information collected on the plants. In our research, we developed a hybrid matrix model that incorporates molecular marker and canopy coverage information as a weighted linear combination to predict grain yield for the soybean nested association mapping (SoyNAM) panel. To obtain the testing and training sets, we clustered the individuals based on their marker and canopy information using 2 different clustering techniques, and we compared 5 different cross-validation schemes. The results showed that the predictive ability of the models was the highest when both the canopy and marker information was included, and it was the lowest when only the canopy information was included.

SELECTION OF CITATIONS
SEARCH DETAIL
...