ABSTRACT
In this study, we investigated the effect of five feature selection approaches on the performance of a mixed model (G-BLUP) and a Bayesian (Bayes C) prediction method. We predicted height, high density lipoprotein cholesterol (HDL) and body mass index (BMI) within 2,186 Croatian and into 810 UK individuals using genome-wide SNP data. Using all SNP information Bayes C and G-BLUP had similar predictive performance across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the UK data. Bayes C outperformed G-BLUP in the prediction of HDL, which is influenced by loci of moderate size, in the UK data. Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.
Subject(s)
Body Height/genetics , Body Mass Index , Cholesterol, HDL/genetics , Quantitative Trait Loci/genetics , Quantitative Trait, Heritable , Bayes Theorem , Cholesterol, HDL/blood , Genomics/methods , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/geneticsABSTRACT
Recently, Hinton introduced the products of experts architecture for density estimation, where individual expert probabilities are multiplied and renormalized. We consider products of gaussian "pancakes" equally elongated in all directions except one and prove that the maximum likelihood solution for the model gives rise to a minor component analysis solution. We also discuss the covariance structure of sums and products of gaussian pancakes or one-factor probabilistic principal component analysis models.