Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 14: 860, 2013 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-24314298

RESUMO

BACKGROUND: In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. RESULTS: The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. CONCLUSIONS: The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.


Assuntos
Cruzamento , Modelos Genéticos , Plantas/genética , Algoritmos , Simulação por Computador , Genômica , Reprodutibilidade dos Testes , Zea mays/genética
2.
Theor Appl Genet ; 126(1): 69-82, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22903736

RESUMO

Genomic selection (GS) is a method for predicting breeding values of plants or animals using many molecular markers that is commonly implemented in two stages. In plant breeding the first stage usually involves computation of adjusted means for genotypes which are then used to predict genomic breeding values in the second stage. We compared two classical stage-wise approaches, which either ignore or approximate correlations among the means by a diagonal matrix, and a new method, to a single-stage analysis for GS using ridge regression best linear unbiased prediction (RR-BLUP). The new stage-wise method rotates (orthogonalizes) the adjusted means from the first stage before submitting them to the second stage. This makes the errors approximately independently and identically normally distributed, which is a prerequisite for many procedures that are potentially useful for GS such as machine learning methods (e.g. boosting) and regularized regression methods (e.g. lasso). This is illustrated in this paper using componentwise boosting. The componentwise boosting method minimizes squared error loss using least squares and iteratively and automatically selects markers that are most predictive of genomic breeding values. Results are compared with those of RR-BLUP using fivefold cross-validation. The new stage-wise approach with rotated means was slightly more similar to the single-stage analysis than the classical two-stage approaches based on non-rotated means for two unbalanced datasets. This suggests that rotation is a worthwhile pre-processing step in GS for the two-stage approaches for unbalanced datasets. Moreover, the predictive accuracy of stage-wise RR-BLUP was higher (5.0-6.1%) than that of componentwise boosting.


Assuntos
Genômica/métodos , Zea mays/genética , Mapeamento Cromossômico/métodos , Cruzamentos Genéticos , Genes de Plantas , Marcadores Genéticos , Genótipo , Haploidia , Análise dos Mínimos Quadrados , Modelos Genéticos , Modelos Estatísticos , Análise de Regressão , Reprodutibilidade dos Testes , Seleção Genética
3.
Biom J ; 54(6): 844-60, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23007738

RESUMO

Plant breeders and variety testing agencies routinely test candidate genotypes (crop varieties, lines, test hybrids) in multiple environments. Such multi-environment trials can be efficiently analysed by mixed models. A single-stage analysis models the entire observed data at the level of individual plots. This kind of analysis is usually considered as the gold standard. In practice, however, it is more convenient to use a two-stage approach, in which experiments are first analysed per environment, yielding adjusted means per genotype, which are then summarised across environments in the second stage. Stage-wise approaches suggested so far are approximate in that they cannot fully reproduce a single-stage analysis, except in very simple cases, because the variance-covariance matrix of adjusted means from individual environments needs to be approximated by a diagonal matrix. This paper proposes a fully efficient stage-wise method, which carries forward the full variance-covariance matrix of adjusted means from the individual environments to the analysis across the series of trials. Provided the variance components are known, this method can fully reproduce the results of a single-stage analysis. Computations are made efficient by a diagonalisation of the residual variance-covariance matrix, which necessitates a corresponding linear transformation of both the first-stage estimates (e.g. adjusted means and regression slopes for plot covariates) and the corresponding design matrices for fixed and random effects. We also exemplify the extension of the general approach to a three-stage analysis. The method is illustrated using two datasets, one real and the other simulated. The proposed approach has close connections with meta-analysis, where environments correspond to centres and genotypes to medical treatments. We therefore compare our theoretical results with recently published results from a meta-analysis.


Assuntos
Biometria/métodos , Cruzamento/métodos , Meio Ambiente , Plantas/genética , Análise de Variância , Mapeamento Cromossômico , Genoma de Planta/genética , Genótipo , Humanos , Modelos Estatísticos , Locos de Características Quantitativas/genética
4.
BMC Proc ; 6 Suppl 2: S10, 2012 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-22640436

RESUMO

BACKGROUND: Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. METHODS: We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). RESULTS: The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. CONCLUSIONS: All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP.

5.
BMC Proc ; 5 Suppl 3: S11, 2011 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-21624167

RESUMO

BACKGROUND: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs. METHODS: We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes. RESULTS: The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF. CONCLUSIONS: Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).

6.
BMC Proc ; 5 Suppl 3: S12, 2011 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-21624168

RESUMO

BACKGROUND: Accurate prediction of genomic breeding values (GEBVs) requires numerous markers. However, predictive accuracy can be enhanced by excluding markers with no effects or with inconsistent effects among crosses that can adversely affect the prediction of GEBVs. METHODS: We present three different approaches for pre-selecting markers prior to predicting GEBVs using four different BLUP methods, including ridge regression and three spatial models. Performances of the models were evaluated using 5-fold cross-validation. RESULTS AND CONCLUSIONS: Ridge regression and the spatial models gave essentially similar fits. Pre-selecting markers was evidently beneficial since excluding markers with inconsistent effects among crosses increased the correlation between GEBVs and true breeding values of the non-phenotyped individuals from 0.607 (using all markers) to 0.625 (using pre-selected markers). Moreover, extension of the ridge regression model to allow for heterogeneous variances between the most significant subset and the complementary subset of pre-selected markers increased predictive accuracy (from 0.625 to 0.648) for the simulated dataset for the QTL-MAS 2010 workshop.

7.
BMC Proc ; 4 Suppl 1: S8, 2010 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-20380762

RESUMO

BACKGROUND: The success of genome-wide selection (GS) approaches will depend crucially on the availability of efficient and easy-to-use computational tools. Therefore, approaches that can be implemented using mixed models hold particular promise and deserve detailed study. A particular class of mixed models suitable for GS is given by geostatistical mixed models, when genetic distance is treated analogously to spatial distance in geostatistics. METHODS: We consider various spatial mixed models for use in GS. The analyses presented for the QTL-MAS 2009 dataset pay particular attention to the modelling of residual errors as well as of polygenetic effects. RESULTS: It is shown that geostatistical models are viable alternatives to ridge regression, one of the common approaches to GS. Correlations between genome-wide estimated breeding values and true breeding values were between 0.879 and 0.889. In the example considered, we did not find a large effect of the residual error variance modelling, largely because error variances were very small. A variance components model reflecting the pedigree of the crosses did not provide an improved fit. CONCLUSIONS: We conclude that geostatistical models deserve further study as a tool to GS that is easily implemented in a mixed model package.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...