Search | VHL Regional Portal

1.

Tabular deep learning: a comparative study applied to multi-task genome-wide prediction.

Fan, Yuhua; Waldmann, Patrik.

BMC Bioinformatics ; 25(1): 322, 2024 Oct 04.

Article in English | MEDLINE | ID: mdl-39367318

ABSTRACT

PURPOSE: More accurate prediction of phenotype traits can increase the success of genomic selection in both plant and animal breeding studies and provide more reliable disease risk prediction in humans. Traditional approaches typically use regression models based on linear assumptions between the genetic markers and the traits of interest. Non-linear models have been considered as an alternative tool for modeling genomic interactions (i.e. non-additive effects) and other subtle non-linear patterns between markers and phenotype. Deep learning has become a state-of-the-art non-linear prediction method for sound, image and language data. However, genomic data is better represented in a tabular format. The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports successful results on various datasets. Tabular deep learning applications in genome-wide prediction (GWP) are still rare. In this work, we perform an overview of the main families of recent deep learning architectures for tabular data and apply them to multi-trait regression and multi-class classification for GWP on real gene datasets. METHODS: The study involves an extensive overview of recent deep learning architectures for tabular data learning: NODE, TabNet, TabR, TabTransformer, FT-Transformer, AutoInt, GANDALF, SAINT and LassoNet. These architectures are applied to multi-trait GWP. Comprehensive benchmarks of various tabular deep learning methods are conducted to identify best practices and determine their effectiveness compared to traditional methods. RESULTS: Extensive experimental results on several genomic datasets (three for multi-trait regression and two for multi-class classification) highlight LassoNet as a standout performer, surpassing both other tabular deep learning models and the highly efficient tree based LightGBM method in terms of both best prediction accuracy and computing efficiency. CONCLUSION: Through series of evaluations on real-world genomic datasets, the study identifies LassoNet as a standout performer, surpassing decision tree methods like LightGBM and other tabular deep learning architectures in terms of both predictive accuracy and computing efficiency. Moreover, the inherent variable selection property of LassoNet provides a systematic way to find important genetic markers that contribute to phenotype expression.

Subject(s)

Deep Learning , Genomics , Genomics/methods , Humans , Phenotype

2.

Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes.

Kihlman, Ragini; Launonen, Ilkka; Sillanpää, Mikko J; Waldmann, Patrik.

G3 (Bethesda) ; 2024 Sep 09.

Article in English | MEDLINE | ID: mdl-39250757

ABSTRACT

In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analysing allele sharing between individuals, one may calculate realized genomic relationships from single nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilise genomic relationships in fixed global covariance modelling, possibly with some non-linear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN's performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbour approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor (GBLUP) method on one simulated and three real data sets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error (MSE). This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.

3.

A proximal LAVA method for genome-wide association and prediction of traits with mixed inheritance patterns.

Waldmann, Patrik.

BMC Bioinformatics ; 22(1): 523, 2021 Oct 26.

Article in English | MEDLINE | ID: mdl-34702175

ABSTRACT

BACKGROUND: The genetic basis of phenotypic traits is highly variable and usually divided into mono-, oligo- and polygenic inheritance classes. Relatively few traits are known to be monogenic or oligogeneic. The majority of traits are considered to have a polygenic background. To what extent there are mixtures between these classes is unknown. The rapid advancement of genomic techniques makes it possible to directly map large amounts of genomic markers (GWAS) and predict unknown phenotypes (GWP). Most of the multi-marker methods for GWAS and GWP falls into one of two regularization frameworks. The first framework is based on [Formula: see text]-norm regularization (e.g. the LASSO) and is suitable for mono- and oligogenic traits, whereas the second framework regularize with the [Formula: see text]-norm (e.g. ridge regression; RR) and thereby is favourable for polygenic traits. A general framework for mixed inheritance is lacking. RESULTS: We have developed a proximal operator algorithm based on the recent LAVA regularization method that jointly performs [Formula: see text]- and [Formula: see text]-norm regularization. The algorithm is built on the alternating direction method of multipliers and proximal translation mapping (LAVA ADMM). When evaluated on the simulated QTLMAS2010 data, it is shown that the LAVA ADMM together with Bayesian optimization of the regularization parameters provides an efficient approach with lower test prediction mean-squared-error (65.89) than the LASSO (66.11), Ridge regression (83.41) and Elastic net (66.11). For the real pig data the test MSE of the LAVA ADMM is 0.850 compared to the LASSO, RR and EN with 0.875, 0.853 and 0.853, respectively. CONCLUSIONS: This study presents the LAVA ADMM that is capable of joint modelling of monogenic major genetic effects and polygenic minor genetic effects which can be used for both genome-wide assoiciation and prediction purposes. The statistical evaluations based on both simulated and real pig data set shows that the LAVA ADMM has better prediction properies than the LASSO, RR and EN. Julia code for the LAVA ADMM is available at: https://github.com/patwa67/LAVAADMM .

Subject(s)

Genome-Wide Association Study , Genome , Animals , Bayes Theorem , Multifactorial Inheritance , Phenotype , Swine

4.

Modeling cow somatic cell count using sensor data as input to generalized additive models.

Anglart, Dorota; Hallén-Sandgren, Charlotte; Waldmann, Patrik; Wiedemann, Martin; Emanuelson, Ulf.

J Dairy Res ; 87(3): 282-289, 2020 Aug.

Article in English | MEDLINE | ID: mdl-32883374

ABSTRACT

This research paper presents a study investigating if sensor data from an automatic milking rotary could be used to model cow somatic cell count (composite milk SCC: CMSCC). CMSCC is valuable for udder health monitoring and individual cow udder health surveillance could be improved by predicting CMSCC between routine samplings. Data regularly recorded in the automatic milking rotary, in one German dairy herd, were collected for analysis. The cows (Holstein-Friesian, n = 372) were milked twice daily and sampled once weekly in afternoon milkings for 8 weeks for CMSCC. From the potential independent variables, including quarter conductivity, milk flow, blood in milk, kick-offs, not milked quarters and incomplete milkings, new variables that combined quarter data were created. Past period records, i.e. lags, of up to seven days before the actual CMSCC sampling event were added in the dataset to investigate if they were of use in modeling the cell count. Univariable generalized additive models (GAM) were used to screen the data to select potential independent variables. Furthermore, several multivariable GAM were fitted in order to compare the importance of the potential independent variables and to explore how the model performance would be affected by using data from various number of days before the CMSCC sampling event. The result of the model selection showed that the best explanation of CMSCC was provided by the model incorporating all significant variables from the variable screening for the seven preceding days, including the day of the CMSCC sampling event. However, using data from only three days before the CMSCC sampling event is suggested to be sufficient to model CMSCC. Variables combining conductivity quarter data, together with quarter conductivity, are suggested to be important in describing CMSCC. We conclude that CMSCC can be modeled with a high degree of explanation using the information routinely recorded by the milking robot.

Subject(s)

Cattle/physiology , Milk/cytology , Animals , Automation , Dairying/instrumentation , Female , Models, Biological

5.

Sparse Convolutional Neural Networks for Genome-Wide Prediction.

Waldmann, Patrik; Pfeiffer, Christina; Mészáros, Gábor.

Front Genet ; 11: 25, 2020.

Article in English | MEDLINE | ID: mdl-32117441

ABSTRACT

Genome-wide prediction (GWP) has become the state-of-the art method in artificial selection. Data sets often comprise number of genomic markers and individuals in ranges from a few thousands to millions. Hence, computational efficiency is important and various machine learning methods have successfully been used in GWP. Neural networks (NN) and deep learning (DL) are very flexible methods that usually show outstanding prediction properties on complex structured data, but their use in GWP is nevertheless rare and debated. This study describes a powerful NN method for genomic marker data that can easily be extended. It is shown that a one-dimensional convolutional neural network (CNN) can be used to incorporate the ordinal information between markers and, together with pooling and â 1-norm regularization, provides a sparse and computationally efficient approach for GWP. The method, denoted CNNGWP, is implemented in the deep learning software Keras, and hyper-parameters of the NN are tuned with Bayesian optimization. Model averaged ensemble predictions further reduce prediction error. Evaluations show that CNNGWP improves prediction error by more than 25% on simulated data and around 3% on real pig data compared with results obtained with GBLUP and the LASSO. In conclusion, the CNNGWP provides a promising approach for GWP, but the magnitude of improvement depends on the genetic architecture and the heritability.

6.

On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction.

Waldmann, Patrik.

Front Genet ; 10: 899, 2019.

Article in English | MEDLINE | ID: mdl-31632436

ABSTRACT

The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r2 ) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias-variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r2 was used as a measure. However, when model selection was based on test MSE and coefficient of determination R2 the ALASSO proved to be the best method. Hence, use of r2 may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R2 as a standardized measure of the accuracy.

7.

AUTALASSO: an automatic adaptive LASSO for genome-wide prediction.

Waldmann, Patrik; Ferencakovic, Maja; Mészáros, Gábor; Khayatzadeh, Negar; Curik, Ino; Sölkner, Johann.

BMC Bioinformatics ; 20(1): 167, 2019 Apr 02.

Article in English | MEDLINE | ID: mdl-30940067

ABSTRACT

BACKGROUND: Genome-wide prediction has become the method of choice in animal and plant breeding. Prediction of breeding values and phenotypes are routinely performed using large genomic data sets with number of markers on the order of several thousands to millions. The number of evaluated individuals is usually smaller which results in problems where model sparsity is of major concern. The LASSO technique has proven to be very well-suited for sparse problems often providing excellent prediction accuracy. Several computationally efficient LASSO algorithms have been developed, but optimization of hyper-parameters can be demanding. RESULTS: We have developed a novel automatic adaptive LASSO (AUTALASSO) based on the alternating direction method of multipliers (ADMM) optimization algorithm. The two major hyper-parameters of ADMM are the learning rate and the regularization factor. The learning rate is automatically tuned with line search and the regularization factor optimized using Golden section search. Results show that AUTALASSO provides superior prediction accuracy when evaluated on simulated and real bull data compared to the adaptive LASSO, LASSO and ridge regression implemented in the popular glmnet software. CONCLUSIONS: The AUTALASSO provides a very flexible and computationally efficient approach to GWP, especially when it is important to obtain high prediction accuracy and genetic gain. The AUTALASSO also has the capability to perform GWAS of both additive and dominance effects with smaller prediction error than the ordinary LASSO.

Subject(s)

Algorithms , Genomics/methods , Animals , Breeding , Cattle , Genome , Software

8.

Approximate Bayesian neural networks in genomic prediction.

Waldmann, Patrik.

Genet Sel Evol ; 50(1): 70, 2018 Dec 22.

Article in English | MEDLINE | ID: mdl-30577737

ABSTRACT

BACKGROUND: Genome-wide marker data are used both in phenotypic genome-wide association studies (GWAS) and genome-wide prediction (GWP). Typically, such studies include high-dimensional data with thousands to millions of single nucleotide polymorphisms (SNPs) recorded in hundreds to a few thousands individuals. Different machine-learning approaches have been used in GWAS and GWP effectively, but the use of neural networks (NN) and deep-learning is still scarce. This study presents a NN model for genomic SNP data. RESULTS: We show, using both simulated and real pig data, that regularization is obtained using weight decay and dropout, and results in an approximate Bayesian (ABNN) model that can be used to obtain model averaged posterior predictions. The ABNN model is implemented in mxnet and shown to yield better prediction accuracy than genomic best linear unbiased prediction and Bayesian LASSO. The mean squared error was reduced by at least 6.5% in the simulated data and by at least 1% in the real data. Moreover, by comparing NN of different complexities, our results confirm that a shallow model with one layer, one neuron, one-hot encoding and a linear activation function performs better than more complex models. CONCLUSIONS: The ABNN model provides a computationally efficient approach with good prediction performance and in which the weight components can also provide information on the importance of the SNPs. Hence, ABNN is suitable for both GWP and GWAS.

Subject(s)

Genome-Wide Association Study/methods , Sequence Analysis, DNA/methods , Algorithms , Animals , Bayes Theorem , Computer Simulation , Genome , Genomics/methods , Machine Learning , Models, Genetic , Neural Networks, Computer , Polymorphism, Single Nucleotide/genetics , Swine/genetics

9.

Misidentification of runs of homozygosity islands in cattle caused by interference with copy number variation or large intermarker distances.

Nandolo, Wilson; Utsunomiya, Yuri T; Mészáros, Gábor; Wurzinger, Maria; Khayadzadeh, Negar; Torrecilha, Rafaela B P; Mulindwa, Henry A; Gondwe, Timothy N; Waldmann, Patrik; Ferencakovic, Maja; Garcia, José F; Rosen, Benjamin D; Bickhart, Derek; van Tassell, Curt P; Curik, Ino; Sölkner, Johann.

Genet Sel Evol ; 50(1): 43, 2018 Aug 22.

Article in English | MEDLINE | ID: mdl-30134820

ABSTRACT

BACKGROUND: Runs of homozygosity (ROH) islands are stretches of homozygous sequence in the genome of a large proportion of individuals in a population. Algorithms for the detection of ROH depend on the similarity of haplotypes. Coverage gaps and copy number variants (CNV) may result in incorrect identification of such similarity, leading to the detection of ROH islands where none exists. Misidentified hemizygous regions will also appear as homozygous based on sequence variation alone. Our aim was to identify ROH islands influenced by marker coverage gaps or CNV, using Illumina BovineHD BeadChip (777 K) single nucleotide polymorphism (SNP) data for Austrian Brown Swiss, Tyrol Grey and Pinzgauer cattle. METHODS: ROH were detected using clustering, and ROH islands were determined from population inbreeding levels for each marker. CNV were detected using a multivariate copy number analysis method and a hidden Markov model. SNP coverage gaps were defined as genomic regions with intermarker distances on average longer than 9.24 kb. ROH islands that overlapped CNV regions (CNVR) or SNP coverage gaps were considered as potential artefacts. Permutation tests were used to determine if overlaps between CNVR with copy losses and ROH islands were due to chance. Diversity of the haplotypes in the ROH islands was assessed by haplotype analyses. RESULTS: In Brown Swiss, Tyrol Grey and Pinzgauer, we identified 13, 22, and 24 ROH islands covering 26.6, 389.0 and 35.8 Mb, respectively, and we detected 30, 50 and 71 CNVR derived from CNV by using both algorithms, respectively. Overlaps between ROH islands, CNVR or coverage gaps occurred for 7, 14 and 16 ROH islands, respectively. About 37, 44 and 52% of the ROH islands coverage in Brown Swiss, Tyrol Grey and Pinzgauer, respectively, were affected by copy loss. Intersections between ROH islands and CNVR were small, but significantly larger compared to ROH islands at random locations across the genome, implying an association between ROH islands and CNVR. Haplotype diversity for reliable ROH islands was lower than for ROH islands that intersected with copy loss CNVR. CONCLUSIONS: Our findings show that a significant proportion of the ROH islands in the bovine genome are artefacts due to CNV or SNP coverage gaps.

Subject(s)

Cattle/genetics , DNA Copy Number Variations , Genotyping Techniques/standards , Homozygote , Animals , Haplotypes , Polymorphism, Single Nucleotide

10.

Genome-wide prediction using Bayesian additive regression trees.

Waldmann, Patrik.

Genet Sel Evol ; 48(1): 42, 2016 06 10.

Article in English | MEDLINE | ID: mdl-27286957

ABSTRACT

BACKGROUND: The goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods. RESULTS: Tests on the QTLMAS2010 simulated data, which are mainly based on additive genetic effects, show that cross-validated optimization of BART provides a smaller prediction error than the RF, BLASSO, GBLUP and RKHS methods, and is almost as accurate as the LASSO method. If dominance and epistasis effects are added to the QTLMAS2010 data, the accuracy of BART relative to the other methods was increased. We also showed that BART can produce importance measures on the SNPs through variable inclusion proportions. In evaluations using real data on pigs, the prediction error was smaller with BART than with the other methods. CONCLUSIONS: BART was shown to be an accurate method for GWP, in which the regression trees guarantee a very sparse representation of additive and complex non-additive genetic effects. Moreover, the Markov chain Monte Carlo algorithm with Bayesian back-fitting provides a computationally efficient procedure that is suitable for high-dimensional genomic data.

Subject(s)

Algorithms , Genome-Wide Association Study , Genomics/methods , Polymorphism, Single Nucleotide/genetics , Animals , Bayes Theorem , Breeding , Genome , Models, Statistical , Phenotype , Quantitative Trait, Heritable , Selection, Genetic , Swine

11.

Radiation and SN38 treatments modulate the expression of microRNAs, cytokines and chemokines in colon cancer cells in a p53-directed manner.

Pathak, Surajit; Meng, Wen-Jian; Nandy, Suman Kumar; Ping, Jie; Bisgin, Atil; Helmfors, Linda; Waldmann, Patrik; Sun, Xiao-Feng.

Oncotarget ; 6(42): 44758-80, 2015 Dec 29.

Article in English | MEDLINE | ID: mdl-26556872

ABSTRACT

Aberrant expression of miRNAs, cytokines and chemokines are involved in pathogenesis of colon cancer. However, the expression of p53 mediated miRNAs, cyto- and chemokines after radiation and SN38 treatment in colon cancer remains elusive. Here, human colon cancer cells, HCT116 with wild-type, heterozygous and a functionally null p53, were treated by radiation and SN38. The expression of 384 miRNAs was determined by using the TaqMan® miRNA array, and the expression of cyto- and chemokines was analyzed by Meso-Scale-Discovery instrument. Up- or down-regulations of miRNAs after radiation and SN38 treatments were largely dependent on p53 status of the cells. Cytokines, IL-6, TNF-α, IL-1ß, Il-4, IL-10, VEGF, and chemokines, IL-8, MIP-1α were increased, and IFN-Î³ expression was decreased after radiation, whereas, IL-6, IFN-Î³, TNF-α, IL-1ß, Il-4, IL-10, IL-8 were decreased, and VEGF and MIP-1α were increased after SN38 treatment. Bioinformatic analysis pointed out that the highly up-regulated miRNAs, let-7f-5p, miR-455-3p, miR-98, miR-155-5p and the down-regulated miRNAs, miR-1, miR-127-5p, miR-142-5p, miR-202-5p were associated with colon cancer pathways and correlated with cyto- or chemokine expression. These miRNAs have the potential for use in colon cancer therapy as they are related to p53, pro- or anti-inflammatory cyto- or chemokines after the radiation and SN38 treatment.

Subject(s)

Antineoplastic Agents, Phytogenic/pharmacology , Camptothecin/analogs & derivatives , Chemokines/metabolism , Chemoradiotherapy , Colonic Neoplasms/therapy , Cytokines/metabolism , MicroRNAs/metabolism , Tumor Suppressor Protein p53/metabolism , Camptothecin/pharmacology , Chemokines/genetics , Colonic Neoplasms/genetics , Colonic Neoplasms/metabolism , Colonic Neoplasms/pathology , Computational Biology , Cytokines/genetics , Databases, Genetic , Dose-Response Relationship, Drug , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , HCT116 Cells , Heterozygote , Humans , Inhibitory Concentration 50 , Irinotecan , MicroRNAs/genetics , Mutation , Radiation Dosage , Signal Transduction , Time Factors , Transfection , Tumor Suppressor Protein p53/genetics

12.

Evaluation of the lasso and the elastic net in genome-wide association studies.

Waldmann, Patrik; Mészáros, Gábor; Gredler, Birgit; Fuerst, Christian; Sölkner, Johann.

Front Genet ; 4: 270, 2013.

Article in English | MEDLINE | ID: mdl-24363662

ABSTRACT

The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator-lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-wide single nucleotide polymorphism (SNP) panel of 5570 Fleckvieh bulls. The first simulated data set displays moderate to high linkage disequilibrium between SNPs, whereas the second simulated data set from the QTLMAS 2010 workshop is biologically more complex. We used cross-validation to find the optimal value of regularization parameter λ with both minimum MSE and minimum MSE + 1SE of minimum MSE. The optimal λ values were used for variable selection. Based on the first simulated data, we found that the minMSE in general picked up too many SNPs. At minMSE + 1SE, the lasso didn't acquire any false positives, but selected too few correct SNPs. The elastic net provided the best compromise between few false positives and many correct selections when the penalty weight α was around 0.1. However, in our simulation setting, this α value didn't result in the lowest minMSE + 1SE. The number of selected SNPs from the QTLMAS 2010 data was after correction for population structure 82 and 161 for the lasso and the elastic net, respectively. In the Fleckvieh data set after population structure correction lasso and the elastic net identified from 1291 to 1966 important SNPs for milk fat content, with major peaks on chromosomes 5, 14, 15, and 20. Hence, we can conclude that it is important to analyze GWAS data with both the lasso and the elastic net and an alternative tuning criterion to minimum MSE is needed for variable selection.

13.

Genetic changes in flowering and morphology in response to adaptation to a high-latitude environment in Arabidopsis lyrata.

Quilot-Turion, Bénédicte; Leppälä, Johanna; Leinonen, Päivi H; Waldmann, Patrik; Savolainen, Outi; Kuittinen, Helmi.

Ann Bot ; 111(5): 957-68, 2013 May.

Article in English | MEDLINE | ID: mdl-23519836

ABSTRACT

BACKGROUND AND AIMS: The adaptive plastic reactions of plant populations to changing climatic factors, such as winter temperatures and photoperiod, have changed during range shifts after the last glaciation. Timing of flowering is an adaptive trait regulated by environmental cues. Its genetics has been intensively studied in annual plants, but in perennials it is currently not well characterized. This study examined the genetic basis of differentiation in flowering time, morphology, and their plastic responses to vernalization in two locally adapted populations of the perennial Arabidopsis lyrata: (1) to determine whether the two populations differ in their vernalization responses for flowering phenology and morphology; and (2) to determine the genomic areas governing differentiation and vernalization responses. METHODS: Two A. lyrata populations, from central Europe and Scandinavia, were grown in growth-chamber conditions with and without cold treatment. A QTL analysis was performed to find genomic regions that interact with vernalization. KEY RESULTS: The population from central Europe flowered more rapidly and invested more in inflorescence growth than the population from alpine Scandinavia, especially after vernalization. The alpine population had consistently a low number of inflorescences and few flowers, suggesting strong constraints due to a short growing season, but instead had longer leaves and higher leaf rosettes. QTL mapping in the F2 population revealed genomic regions governing differentiation in flowering time and morphology and, in some cases, the allelic effects from the two populations on a trait were influenced by vernalization (QTL × vernalization interactions). CONCLUSIONS: The results indicate that many potentially adaptive genetic changes have occurred during colonization; the two populations have diverged in their plastic responses to vernalization in traits closely connected to fitness through changes in many genomic areas.

Subject(s)

Adaptation, Physiological/genetics , Altitude , Arabidopsis/anatomy & histology , Arabidopsis/genetics , Environment , Flowers/anatomy & histology , Flowers/genetics , Cold Temperature , Crosses, Genetic , Flowers/physiology , Genetic Linkage , Germany , Homozygote , Norway , Quantitative Trait Loci/genetics

14.

Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials.

Banerjee, Sudipto; Finley, Andrew O; Waldmann, Patrik; Ericsson, Tore.

J Am Stat Assoc ; 105(490): 506-521, 2010 Jun 01.

Article in English | MEDLINE | ID: mdl-20676229

ABSTRACT

This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets of multiple traits of interest. Direct application of such multivariate models to large spatial datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several thousand iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negates the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects we discuss a multivariate predictive process that reduces the computational burden by projecting the original process onto a subspace generated by realizations of the original process at a specified set of locations (or knots). We illustrate the proposed methods using a synthetic dataset with multivariate additive and dominant genetic effects and anisotropic spatial residuals, and a large dataset from a scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.

15.

Bayesian inference of genetic parameters based on conditional decompositions of multivariate normal distributions.

Hallander, Jon; Waldmann, Patrik; Wang, Chunkao; Sillanpää, Mikko J.

Genetics ; 185(2): 645-54, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20351218

ABSTRACT

It is widely recognized that the mixed linear model is an important tool for parameter estimation in the analysis of complex pedigrees, which includes both pedigree and genomic information, and where mutually dependent genetic factors are often assumed to follow multivariate normal distributions of high dimension. We have developed a Bayesian statistical method based on the decomposition of the multivariate normal prior distribution into products of conditional univariate distributions. This procedure permits computationally demanding genetic evaluations of complex pedigrees, within the user-friendly computer package WinBUGS. To demonstrate and evaluate the flexibility of the method, we analyzed two example pedigrees: a large noninbred pedigree of Scots pine (Pinus sylvestris L.) that includes additive and dominance polygenic relationships and a simulated pedigree where genomic relationships have been calculated on the basis of a dense marker map. The analysis showed that our method was fast and provided accurate estimates and that it should therefore be a helpful tool for estimating genetic parameters of complex pedigrees quickly and reliably.

Subject(s)

Bayes Theorem , Biometry/methods , Pinus sylvestris/genetics , Genes , Linear Models , Normal Distribution , Pedigree

16.

Optimization of selection contribution and mate allocations in monoecious tree breeding populations.

Hallander, Jon; Waldmann, Patrik.

BMC Genet ; 10: 70, 2009 Nov 06.

Article in English | MEDLINE | ID: mdl-19895684

ABSTRACT

BACKGROUND: The combination of optimized contribution dynamic selection and various mating schemes was investigated over seven generations for a typical tree breeding scenario. The allocation of mates was optimized using a simulated annealing algorithm for various object functions including random mating (RM), positive assortative mating (PAM) and minimization of pair-wise coancestry between mates (MCM) all combined with minimization of variance in family size and coancestry. The present study considered two levels of heritability (0.05 and 0.25), two restrictions on relatedness (group coancestry; 1 and 2%) and two maximum permissible numbers of crosses in each generation (100 and 400). The infinitesimal genetic model was used to simulate the genetic architecture of the trait that was the subject of selection. A framework of the long term genetic contribution of ancestors was used to examine the impacts of the mating schemes on population parameters. RESULTS: MCM schemes produced on average, an increased rate of genetic gain in the breeding population, although the difference between schemes was small but significant after seven generations (up to 7.1% more than obtained with RM). In addition, MCM reduced the level of inbreeding by as much as 37% compared with RM, although the rate of inbreeding was similar after three generations of selection. PAM schemes yielded levels of genetic gain similar to those produced by RM, but the increase in the level of inbreeding was substantial (up to 43%). CONCLUSION: The main reason why MCM schemes yielded higher genetic gains was the improvement in managing the long term genetic contribution of founders in the population; this was achieved by connecting unrelated families. In addition, the accumulation of inbreeding was reduced by MCM schemes since the variance in long term genetic contributions of founders was smaller than in the other schemes. Consequently, by combining an MCM scheme with an algorithm that optimizes contributions of the selected individuals, a higher long term response is obtained while reducing the risk within the breeding program.

Subject(s)

Genetics, Population , Models, Genetic , Trees/genetics , Algorithms , Computer Simulation , Inbreeding , Selection, Genetic

17.

Optimum contribution selection in large general tree breeding populations with an application to Scots pine.

Hallander, Jon; Waldmann, Patrik.

Theor Appl Genet ; 118(6): 1133-42, 2009 Apr.

Article in English | MEDLINE | ID: mdl-19183858

ABSTRACT

Development of selection methods that optimises selection differential subject to a constraint on the increase of inbreeding (or coancestry) in a population is an important part of breeding programmes. One such method that has received much attention in animal breeding is the optimum contribution (OC) dynamic selection method. We implemented the OC algorithm and applied it to a diallel progeny trial of Pinus sylvestris L. (Scots pine) focussing on two traits (total tree height and stem diameter). The OC method resulted in a higher increase in genetic gain (8-30%) compared to the genetic gain achieved using standard restricted selection method at the same level of coancestry constraint. Genetic merit obtained at two different levels of restriction on coancestry showed that the benefit of OC was highest when restriction was strict. At the same level of genetic merit, OC decreased coancestry with 56 and 39% for diameter and height, respectively, compared to the level of coancestry obtained using unrestricted truncation selection. Inclusion of a dominance term in the statistical model resulted in changes in contribution rank of trees with 7 and 13% for diameter and height, respectively, compared to results achieved by using a pure additive model. However, the genetic gain was higher for the pure additive model than for the model including dominance for both traits.

Subject(s)

Breeding , Genetics, Population , Pinus sylvestris/genetics , Selection, Genetic , Algorithms , Animals , Computer Simulation , Crosses, Genetic , Inbreeding , Models, Genetic

18.

Easy and flexible Bayesian inference of quantitative genetic parameters.

Waldmann, Patrik.

Evolution ; 63(6): 1640-3, 2009 Jun.

Article in English | MEDLINE | ID: mdl-19187246

ABSTRACT

There has been a tremendous advancement of Bayesian methodology in quantitative genetics and evolutionary biology. Still, there are relatively few publications that apply this methodology, probably because the availability of multipurpose and user-friendly software is somewhat limited. It is here described how only a few rows of code of the well-developed and very flexible Bayesian software WinBUGS (Lunn et al. 2000) can be used for inference of the additive polygenic variance and heritabilty in pedigrees of general design. The presented code is illustrated by application to an earlier published dataset of Scots pine.

Subject(s)

Bayes Theorem , Genetic Variation , Software , Computational Biology/methods , Evolution, Molecular , Models, Genetic , Pedigree , Sequence Analysis, DNA

19.

Hierarchical spatial modeling of additive and dominance genetic variance for large spatial trial datasets.

Finley, Andrew O; Banerjee, Sudipto; Waldmann, Patrik; Ericsson, Tore.

Biometrics ; 65(2): 441-51, 2009 Jun.

Article in English | MEDLINE | ID: mdl-18759829

ABSTRACT

SUMMARY: This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein-Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.

Subject(s)

Biometry/methods , Data Interpretation, Statistical , Databases, Genetic , Epidemiologic Research Design , Genetic Variation/genetics , Genetics, Population , Models, Genetic , Computer Simulation , Humans

20.

Efficient Markov chain Monte Carlo implementation of Bayesian analysis of additive and dominance genetic variances in noninbred pedigrees.

Waldmann, Patrik; Hallander, Jon; Hoti, Fabian; Sillanpää, Mikko J.

Genetics ; 179(2): 1101-12, 2008 Jun.

Article in English | MEDLINE | ID: mdl-18558655

ABSTRACT

Accurate and fast computation of quantitative genetic variance parameters is of great importance in both natural and breeding populations. For experimental designs with complex relationship structures it can be important to include both additive and dominance variance components in the statistical model. In this study, we introduce a Bayesian Gibbs sampling approach for estimation of additive and dominance genetic variances in the traditional infinitesimal model. The method can handle general pedigrees without inbreeding. To optimize between computational time and good mixing of the Markov chain Monte Carlo (MCMC) chains, we used a hybrid Gibbs sampler that combines a single site and a blocked Gibbs sampler. The speed of the hybrid sampler and the mixing of the single-site sampler were further improved by the use of pretransformed variables. Two traits (height and trunk diameter) from a previously published diallel progeny test of Scots pine (Pinus sylvestris L.) and two large simulated data sets with different levels of dominance variance were analyzed. We also performed Bayesian model comparison on the basis of the posterior predictive loss approach. Results showed that models with both additive and dominance components had the best fit for both height and diameter and for the simulated data with high dominance. For the simulated data with low dominance, we needed an informative prior to avoid the dominance variance component becoming overestimated. The narrow-sense heritability estimates in the Scots pine data were lower compared to the earlier results, which is not surprising because the level of dominance variance was rather high, especially for diameter. In general, the hybrid sampler was considerably faster than the blocked sampler and displayed better mixing properties than the single-site sampler.

Subject(s)

Bayes Theorem , Genetic Variation , Markov Chains , Models, Genetic , Monte Carlo Method , Animals , Breeding , Databases, Genetic , Genes, Dominant , Models, Statistical , Pedigree , Pinus sylvestris/anatomy & histology , Pinus sylvestris/genetics

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL