Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Theor Appl Genet ; 137(1): 21, 2024 Jan 14.
Article in English | MEDLINE | ID: mdl-38221602

ABSTRACT

KEY MESSAGE: Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model. Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.


Subject(s)
Models, Genetic , Plant Breeding , Humans , Animals , Bayes Theorem , Genome , Genomics/methods , Phenotype
2.
Plant Genome ; 15(3): e20214, 2022 09.
Article in English | MEDLINE | ID: mdl-35535459

ABSTRACT

Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS.


Genomic-enabled prediction was used for categorical traits to capture data patterns in different environments. Two different genome-based models were used for predicting categorical traits. Genome-based prediction with genotype × environment interaction was used.


Subject(s)
Plant Breeding , Triticum , Bayes Theorem , Genome , Phenotype , Plant Breeding/methods , Triticum/genetics
3.
Methods Mol Biol ; 2467: 285-327, 2022.
Article in English | MEDLINE | ID: mdl-35451780

ABSTRACT

Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction. When multitrait data is available, models that are able to account for correlations between phenotypic traits are preferred, since these models help increase the prediction accuracy when the degree of correlation is moderate to large. For this reason, in this chapter we review multitrait models for genome-enabled prediction and we illustrate the power of this model with real examples. In addition, we provide details of the software (R code) available for its application to help users implement these models with its own data. The multitrait models were implemented under conventional Bayesian Ridge regression and best linear unbiased predictor, but also under a deep learning framework. The multitrait deep learning framework helps implement prediction models with mixed outcomes (continuous, binary, ordinal, and count, measured on different scales), which is not easy in conventional statistical models. The illustrative examples are very detailed in order to make the implementation of multitrait models in plant and animal breeding friendlier for breeders and scientists.


Subject(s)
Genome , Genomics , Animals , Bayes Theorem , Genotype , Machine Learning , Models, Genetic , Phenotype
4.
G3 (Bethesda) ; 12(2)2022 02 04.
Article in English | MEDLINE | ID: mdl-34849802

ABSTRACT

When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2-17.45% (datasets 1-3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.


Subject(s)
Genome , Models, Genetic , Bayes Theorem , Genotype , Phenotype
5.
Life (Basel) ; 11(12)2021 Dec 03.
Article in English | MEDLINE | ID: mdl-34947868

ABSTRACT

The rapid spread of the new SARS-CoV-2 virus triggered a global health crisis, disproportionately impacting people with pre-existing health conditions and particular demographic and socioeconomic characteristics. One of the main concerns of governments has been to avoid health systems becoming overwhelmed. For this reason, they have implemented a series of non-pharmaceutical measures to control the spread of the virus, with mass tests being one of the most effective controls. To date, public health officials continue to promote some of these measures, mainly due to delays in mass vaccination and the emergence of new virus strains. In this research, we studied the association between COVID-19 positivity rate and hospitalization rates at the county level in California using a mixed linear model. The analysis was performed in the three waves of confirmed COVID-19 cases registered in the state to September 2021. Our findings suggest that test positivity rate is consistently associated with hospitalization rates at the county level for all study waves. Demographic factors that seem to be related to higher hospitalization rates changed over time, as the profile of the pandemic impacted different fractions of the population in counties across California.

6.
G3 (Bethesda) ; 11(2)2021 02 09.
Article in English | MEDLINE | ID: mdl-33693599

ABSTRACT

In genomic selection choosing the statistical machine learning model is of paramount importance. In this paper, we present an application of a zero altered random forest model with two versions (ZAP_RF and ZAPC_RF) to deal with excess zeros in count response variables. The proposed model was compared with the conventional random forest (RF) model and with the conventional Generalized Poisson Ridge regression (GPR) using two real datasets, and we found that, in terms of prediction performance, the proposed zero inflated random forest model outperformed the conventional RF and GPR models.


Subject(s)
Genome , Models, Statistical , Genomics
7.
Heredity (Edinb) ; 126(4): 577-596, 2021 04.
Article in English | MEDLINE | ID: mdl-33649571

ABSTRACT

The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others. In this study we provide the building process of seven kernel methods: linear, polynomial, sigmoid, Gaussian, Exponential, Arc-cosine 1 and Arc-cosine L. Additionally, we highlight illustrative examples for implementing exact kernel methods for genomic prediction under a single-environment, a multi-environment and multi-trait framework, as well as for the implementation of sparse kernel methods under a multi-environment framework. These examples are followed by a discussion on the strengths and limitations of kernel methods and, subsequently by conclusions about the main contributions of this paper.


Subject(s)
Gene-Environment Interaction , Models, Genetic , Bayes Theorem , Genomics , Triticum
8.
G3 (Bethesda) ; 10(11): 4177-4190, 2020 11 05.
Article in English | MEDLINE | ID: mdl-32934019

ABSTRACT

The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.


Subject(s)
Deep Learning , Genome , Genomics , Neural Networks, Computer , Poisson Distribution
9.
G3 (Bethesda) ; 7(6): 1833-1853, 2017 06 07.
Article in English | MEDLINE | ID: mdl-28391241

ABSTRACT

There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments.


Subject(s)
Bayes Theorem , Gene-Environment Interaction , Genomics/methods , Genotype , Models, Genetic , Algorithms , Biological Evolution , Models, Statistical , Reproducibility of Results , Selection, Genetic , Triticum/genetics , Zea mays/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...