Search | VHL Regional Portal

Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models.

Okut, Hayrettin; Wu, Xiao-Liao; Rosa, Guilherme J M; Bauck, Stewart; Woodward, Brent W; Schnabel, Robert D; Taylor, Jeremy F; Gianola, Daniel.

Genet Sel Evol ; 45: 34, 2013 Sep 11.

Article in English | MEDLINE | ID: mdl-24024641

ABSTRACT

BACKGROUND: Artificial neural networks (ANN) mimic the function of the human brain and are capable of performing massively parallel computations for data processing and knowledge representation. ANN can capture nonlinear relationships between predictors and responses and can adaptively learn complex functional forms, in particular, for situations where conventional regression models are ineffective. In a previous study, ANN with Bayesian regularization outperformed a benchmark linear model when predicting milk yield in dairy cattle or grain yield of wheat. Although breeding values rely on the assumption of additive inheritance, the predictive capabilities of ANN are of interest from the perspective of their potential to increase the accuracy of prediction of molecular breeding values used for genomic selection. This motivated the present study, in which the aim was to investigate the accuracy of ANN when predicting the expected progeny difference (EPD) of marbling score in Angus cattle. Various ANN architectures were explored, which involved two training algorithms, two types of activation functions, and from 1 to 4 neurons in hidden layers. For comparison, BayesCπ models were used to select a subset of optimal markers (referred to as feature selection), under the assumption of additive inheritance, and then the marker effects were estimated using BayesCπ with π set equal to zero. This procedure is referred to as BayesCpC and was implemented on a high-throughput computing cluster. RESULTS: The ANN with Bayesian regularization method performed equally well for prediction of EPD as BayesCpC, based on prediction accuracy and sum of squared errors. With the 3K-SNP panel, for example, prediction accuracy was 0.776 using BayesCpC, and ranged from 0.776 to 0.807 using BRANN. With the selected 700-SNP panel, prediction accuracy was 0.863 for BayesCpC and ranged from 0.842 to 0.858 for BRANN. However, prediction accuracy for the ANN with scaled conjugate gradient back-propagation was lower, ranging from 0.653 to 0.689 with the 3K-SNP panel, and from 0.743 to 0.793 with the selected 700-SNP panel. CONCLUSIONS: ANN with Bayesian regularization performed as well as linear Bayesian regression models in predicting additive genetic values, supporting the idea that ANN are useful as universal approximators of functions of interest in breeding contexts.

Subject(s)

Bayes Theorem , Cattle/genetics , Linear Models , Neural Networks, Computer , Algorithms , Animals , Breeding , Genome , Genotype , Humans , Models, Genetic , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable

A novel analytical method, Birth Date Selection Mapping, detects response of the Angus (Bos taurus) genome to selection on complex traits.

Decker, Jared E; Vasco, Daniel A; McKay, Stephanie D; McClure, Matthew C; Rolf, Megan M; Kim, JaeWoo; Northcutt, Sally L; Bauck, Stewart; Woodward, Brent W; Schnabel, Robert D; Taylor, Jeremy F.

BMC Genomics ; 13: 606, 2012 Nov 09.

Article in English | MEDLINE | ID: mdl-23140540

ABSTRACT

BACKGROUND: Several methods have recently been developed to identify regions of the genome that have been exposed to strong selection. However, recent theoretical and empirical work suggests that polygenic models are required to identify the genomic regions that are more moderately responding to ongoing selection on complex traits. We examine the effects of multi-trait selection on the genome of a population of US registered Angus beef cattle born over a 50-year period representing approximately 10 generations of selection. We present results from the application of a quantitative genetic model, called Birth Date Selection Mapping, to identify signatures of recent ongoing selection. RESULTS: We show that US Angus cattle have been systematically selected to alter their mean additive genetic merit for most of the 16 production traits routinely recorded by breeders. Using Birth Date Selection Mapping, we estimate the time-dependency of allele frequency for 44,817 SNP loci using genomic best linear unbiased prediction, generalized least squares, and BayesCπ analyses. Finally, we reconstruct the primary phenotypes that have historically been exposed to selection from a genome-wide analysis of the 16 production traits and gene ontology enrichment analysis. CONCLUSIONS: We demonstrate that Birth Date Selection Mapping utilizing mixed models corrects for time-dependent pedigree sampling effects that lead to spurious SNP associations and reveals genomic signatures of ongoing selection on complex traits. Because multiple traits have historically been selected in concert and most quantitative trait loci have small effects, selection has incrementally altered allele frequencies throughout the genome. Two quantitative trait loci of large effect were not the most strongly selected of the loci due to their antagonistic pleiotropic effects on strongly selected phenotypes. Birth Date Selection Mapping may readily be extended to temporally-stratified human or model organism populations.

Subject(s)

Genome , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Selection, Genetic , Alleles , Animals , Bayes Theorem , Breeding , Cattle , Female , Gene Frequency , Genome-Wide Association Study , Genotype , Least-Squares Analysis , Male , Pedigree , Phenotype , Time Factors

An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle.

Sun, Chuanyu; Wu, Xiao-Lin; Weigel, Kent A; Rosa, Guilherme J M; Bauck, Stewart; Woodward, Brent W; Schnabel, Robert D; Taylor, Jeremy F; Gianola, Daniel.

Genet Res (Camb) ; 94(3): 133-50, 2012 Jun.

Article in English | MEDLINE | ID: mdl-22809677

ABSTRACT

Summary Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority 'voting' to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

Subject(s)

Algorithms , Cattle/genetics , Genomics , Polymorphism, Single Nucleotide , Animals , Genome-Wide Association Study , Genotype , Models, Genetic , Oligonucleotide Array Sequence Analysis

Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation.

Saatchi, Mahdi; McClure, Mathew C; McKay, Stephanie D; Rolf, Megan M; Kim, JaeWoo; Decker, Jared E; Taxis, Tasia M; Chapple, Richard H; Ramey, Holly R; Northcutt, Sally L; Bauck, Stewart; Woodward, Brent; Dekkers, Jack C M; Fernando, Rohan L; Schnabel, Robert D; Garrick, Dorian J; Taylor, Jeremy F.

Genet Sel Evol ; 43: 40, 2011 Nov 28.

Article in English | MEDLINE | ID: mdl-22122853

ABSTRACT

BACKGROUND: Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction. METHODS: Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values. RESULTS: Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied. CONCLUSIONS: These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but the recurrent inclusion of genotyped sires in retraining analyses will be necessary to routinely produce for the industry the direct genomic values with the highest accuracy.

Subject(s)

Breeding , Cattle/genetics , Genomics/methods , Genomics/standards , Animals , Cattle/growth & development , Cluster Analysis , Female , Male , Models, Genetic , Pedigree , Quantitative Trait, Heritable

A primer on high-throughput computing for genomic selection.

Wu, Xiao-Lin; Beissinger, Timothy M; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J M; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel.

Front Genet ; 2: 4, 2011.

Article in English | MEDLINE | ID: mdl-22303303

ABSTRACT

High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin-Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL