Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 143
Filter
1.
Genet Sel Evol ; 56(1): 31, 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38684971

ABSTRACT

BACKGROUND: Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches. RESULTS: The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed. CONCLUSIONS: Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.


Subject(s)
Biomarkers , Models, Genetic , Polymorphism, Single Nucleotide , Animals , Cattle/genetics , Biomarkers/blood , Cattle Diseases/genetics , Cattle Diseases/blood , Bayes Theorem , Female , Metabolic Diseases/genetics , Metabolic Diseases/veterinary , Metabolic Diseases/blood , Genomics/methods
2.
G3 (Bethesda) ; 13(8)2023 08 09.
Article in English | MEDLINE | ID: mdl-37216670

ABSTRACT

This study investigates nonlinear kernels for multitrait (MT) genomic prediction using support vector regression (SVR) models. We assessed the predictive ability delivered by single-trait (ST) and MT models for 2 carcass traits (CT1 and CT2) measured in purebred broiler chickens. The MT models also included information on indicator traits measured in vivo [Growth and feed efficiency trait (FE)]. We proposed an approach termed (quasi) multitask SVR (QMTSVR), with hyperparameter optimization performed via genetic algorithm. ST and MT Bayesian shrinkage and variable selection models [genomic best linear unbiased predictor (GBLUP), BayesC (BC), and reproducing kernel Hilbert space (RKHS) regression] were employed as benchmarks. MT models were trained using 2 validation designs (CV1 and CV2), which differ if the information on secondary traits is available in the testing set. Models' predictive ability was assessed with prediction accuracy (ACC; i.e. the correlation between predicted and observed values, divided by the square root of phenotype accuracy), standardized root-mean-squared error (RMSE*), and inflation factor (b). To account for potential bias in CV2-style predictions, we also computed a parametric estimate of accuracy (ACCpar). Predictive ability metrics varied according to trait, model, and validation design (CV1 or CV2), ranging from 0.71 to 0.84 for ACC, 0.78 to 0.92 for RMSE*, and between 0.82 and 1.34 for b. The highest ACC and smallest RMSE* were achieved with QMTSVR-CV2 in both traits. We observed that for CT1, model/validation design selection was sensitive to the choice of accuracy metric (ACC or ACCpar). Nonetheless, the higher predictive accuracy of QMTSVR over MTGBLUP and MTBC was replicated across accuracy metrics, besides the similar performance between the proposed method and the MTRKHS model. Results showed that the proposed approach is competitive with conventional MT Bayesian regression models using either Gaussian or spike-slab multivariate priors.


Subject(s)
Chickens , Multifactorial Inheritance , Animals , Chickens/genetics , Bayes Theorem , Heuristics , Phenotype , Models, Genetic , Genotype
3.
Genet Sel Evol ; 54(1): 78, 2022 Dec 02.
Article in English | MEDLINE | ID: mdl-36460973

ABSTRACT

BACKGROUND: Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success. RESULTS: Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly. CONCLUSIONS: The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.


Subject(s)
Genomics , Animals , Bayes Theorem , Cross-Sectional Studies , Likelihood Functions , Phenotype
4.
Methods Mol Biol ; 2467: 189-218, 2022.
Article in English | MEDLINE | ID: mdl-35451777

ABSTRACT

Growth of artificial intelligence and machine learning (ML) methodology has been explosive in recent years. In this class of procedures, computers get knowledge from sets of experiences and provide forecasts or classification. In genome-wide based prediction (GWP), many ML studies have been carried out. This chapter provides a description of main semiparametric and nonparametric algorithms used in GWP in animals and plants. Thirty-four ML comparative studies conducted in the last decade were used to develop a meta-analysis through a Thurstonian model, to evaluate algorithms with the best predictive qualities. It was found that some kernel, Bayesian, and ensemble methods displayed greater robustness and predictive ability. However, the type of study and data distribution must be considered in order to choose the most appropriate model for a given problem.


Subject(s)
Artificial Intelligence , Machine Learning , Algorithms , Animals , Bayes Theorem , Genome
6.
J Anim Breed Genet ; 139(3): 247-258, 2022 May.
Article in English | MEDLINE | ID: mdl-34931377

ABSTRACT

Single-step GBLUP (ssGBLUP) to obtain genomic prediction was proposed in 2009. Many studies have investigated ssGBLUP in genomic selection in animals and plants using a standard linear kernel (similarity matrix) called genomic relationship matrix (G). More general kernels should allow capturing non-additive effects as well, whereas GBLUP is based on additive gene action. In this study, we generalized ssBLUP to accommodate two non-linear kernels, the averaged Gaussian kernel (AK) and the recently developed arc-cosine deep kernel (DK). We evaluated the methodology using body weight (BW) and hen-housing production (HHP) traits, recorded on a sample of phenotyped and genotyped commercial broiler chickens. There were, thus, different ssGBLUP models corresponding to G, AK and DK. We used random replication of training (TRN) and testing (TST) layouts at different genotyping rates (20%, 40%, 60% and 80% of all birds) in three selective genotyping scenarios. The selections were genotyping the youngest individuals in the pedigree (YS), random genotyping (RS) and genotyping based on parent average (PA). Predictive abilities were measured using rank correlations between the observed and the predictive phenotypic values in TST for each random partition. Prediction accuracy was influenced by the type of kernel when a large proportion of birds was genotyped. An advantage of non-linear kernels (AK and DK) was more apparent when 60 and 80% of birds had been genotyped. For BW, the lowest rank correlations were obtained with G (0.093 ± 0.015 using RS by 20% genotyped individuals) and the highest values with DK (0.320 ± 0.016 in the PA setting with 80% genotyped individuals). For HHP, the lowest and highest rank correlations were obtained by AK with 20% and 80% genotyped individuals, 0.071 ± 0.016 (in RS) and 0.23 ± 0.016 (in PA) respectively. Our results indicated that AK and DK are more effective than G when a large proportion of the target population is genotyped. Our expectation is that ssGBLUP with AK or DK models can perform even better than G when non-additive genetic effects influence the underlying variability of complex traits.


Subject(s)
Chickens , Models, Genetic , Animals , Chickens/genetics , Female , Genome , Genotype , Pedigree , Phenotype
7.
MRS Adv ; 6(25): 636-643, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34532078

ABSTRACT

Acoustic forces are an attractive pathway to achieve directed assembly for multi-phase materials via additive processes. Programmatic integration of microstructure and structural features during deposition offers opportunities for optimizing printed component performance. We detail recent efforts to integrate acoustic focusing with a direct-ink-write mode of printing to modulate material transport properties (e.g. conductivity). Acoustic field-assisted printing, operating under a multi-node focusing condition, supports deposition of materials with multiple focused lines in a single-pass printed line. Here, we report the demonstration of acoustic focusing in concert with diffusive self-assembly to rapidly assembly and print multiscale, mm-length colloidal solids on a timescale of seconds to minutes. These efforts support the promising capabilities of acoustic field-assisted deposition-based printing to achieve spatial control of printed microstructures with deterministic, long-range ordering across multiple length scales.

9.
Theor Appl Genet ; 134(9): 3069-3081, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34117908

ABSTRACT

KEY MESSAGE: Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.


Subject(s)
Chromosomes, Plant/genetics , Genome, Plant , Phenotype , Plant Breeding/methods , Polymorphism, Single Nucleotide , Zea mays/growth & development , Zea mays/genetics , Chromosome Mapping/methods , Quantitative Trait Loci
10.
G3 (Bethesda) ; 11(7)2021 07 14.
Article in English | MEDLINE | ID: mdl-33826720

ABSTRACT

The use of DNA methylation signatures to predict chronological age and aging rate is of interest in many fields, including disease prevention and treatment, forensics, and anti-aging medicine. Although a large number of methylation markers are significantly associated with age, most age-prediction methods use a few markers selected based on either previously published studies or datasets containing methylation information. Here, we implemented reproducing kernel Hilbert spaces (RKHS) regression and a ridge regression model in a Bayesian framework that utilized phenotypic and methylation profiles simultaneously to predict chronological age. We used over 450,000 CpG sites from the whole blood of a large cohort of 4409 human individuals with a range of 10-101 years of age. Models were fitted using adjusted and un-adjusted methylation measurements for cell heterogeneity. Un-adjusted methylation scores delivered a significantly higher prediction accuracy than adjusted methylation data, with a correlation between age and predicted age of 0.98 and a root mean square error (RMSE) of 3.54 years in un-adjusted data, and 0.90 (correlation) and 7.16 (RMSE) years in adjusted data. Reducing the number of predictors (CpG sites) through subset selection improved predictive power with a correlation of 0.98 and an RMSE of 2.98 years in the RKHS model. We found distinct global methylation patterns, with a significant increase in the proportion of methylated cytosines in CpG islands and a decreased proportion in other CpG types, including CpG shore, shelf, and open sea (P < 5e-06). Epigenetic drift seemed to be a widespread phenomenon as more than 97% of the age-associated methylation sites had heteroscedasticity. Apparent methylomic aging rate (AMAR) had a sex-specific pattern, with an increase in AMAR in females with age related to males.


Subject(s)
Aging , DNA Methylation , Male , Female , Humans , Child, Preschool , Bayes Theorem , DNA Methylation/genetics , CpG Islands , Aging/genetics , Epigenesis, Genetic
11.
Front Genet ; 12: 611506, 2021.
Article in English | MEDLINE | ID: mdl-33692825

ABSTRACT

Feature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal's own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy and stability was obtained with SVM and GB using 500 or more SNPs [0.28 (0.02) and 0.27 (0.04) for SVM and GB with 1,000 SNPs, respectively]. With larger subset sizes (1,000-1,500 SNPs), the filter method had no influence on prediction quality, which was similar to that attained with a random selection. With 50-250 SNPs, the FS method had a huge impact on prediction quality: it was very poor for tree-based methods combined with any learner, but good and similar to what was obtained with larger SNP subsets when spearcor or mrmr were implemented with or without embedded methods. Those filters also led to very stable results, suggesting their potential use for designing low-density SNP chips for genome-based evaluation of feed efficiency.

12.
Ultramicroscopy ; 220: 113160, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33197699

ABSTRACT

A monolithic active pixel sensor based direct detector that is optimized for the primary beam energies in scanning electron microscopes is implemented for electron back-scattered diffraction (EBSD) applications. The high detection efficiency of the detector and its large array of pixels allow sensitive and accurate detection of Kikuchi bands arising from primary electron beam excitation energies of 4 keV to 28 keV, with the optimal contrast occurring in the range of 8-16 keV. The diffraction pattern acquisition speed is substantially improved via a sparse sampling mode, resulting from the acquisition of a reduced number of pixels on the detector. Standard inpainting algorithms are implemented to effectively estimate the information in the skipped regions in the acquired diffraction pattern. For EBSD mapping, an acquisition speed as high as 5988 scan points per second is demonstrated, with a tolerable fraction of indexed points and accuracy. The collective capabilities spanning from high angular resolution EBSD patterns to high speed pattern acquisition are achieved on the same detector, facilitating simultaneous detection modalities that enable a multitude of advanced EBSD applications, including lattice strain mapping, structural refinement, low-dose characterization, 3D-EBSD and dynamic in situ EBSD.

13.
Science ; 370(6512): 95-101, 2020 10 02.
Article in English | MEDLINE | ID: mdl-33004516

ABSTRACT

Refractory multiprincipal element alloys (MPEAs) are promising materials to meet the demands of aggressive structural applications, yet require fundamentally different avenues for accommodating plastic deformation in the body-centered cubic (bcc) variants of these alloys. We show a desirable combination of homogeneous plastic deformability and strength in the bcc MPEA MoNbTi, enabled by the rugged atomic environment through which dislocations must navigate. Our observations of dislocation motion and atomistic calculations unveil the unexpected dominance of nonscrew character dislocations and numerous slip planes for dislocation glide. This behavior lends credence to theories that explain the exceptional high temperature strength of similar alloys. Our results advance a defect-aware perspective to alloy design strategies for materials capable of performance across the temperature spectrum.

14.
Phys Rev E ; 102(3-1): 032605, 2020 Sep.
Article in English | MEDLINE | ID: mdl-33075911

ABSTRACT

Here we report on compression experiments of colloidal pillars in which the evolution of a shear band can be followed at the particle level during deformation. Quasistatic deformation results in dilation and anisotropic changes in coordination in a localized band of material. Additionally, a transition from solid- to liquidlike mechanical response accompanies the structural change in the band, as evidenced by saturation of the packing fraction at the glass transition point, a diminishing ability to host anelastic strains, and a rapid decay in the long-range strain correlations. Overall, our results suggest that shear banding quantitatively resembles a localized, driven glass transition.

15.
PLoS One ; 15(8): e0236629, 2020.
Article in English | MEDLINE | ID: mdl-32797113

ABSTRACT

An important economic reason for the loss of local breeds is that they tend to be less productive, and hence having less market value than commercial breeds. Nevertheless, local breeds often have irreplaceable values, genetically and sociologically. In the breeding programs with local breeds, it is crucial to balance the selection for genetic gain and the maintaining of genetic diversity. These two objectives are often conflicting, and finding the optimal point of the trade-off has been a challenge for breeders. Genomic selection (GS) provides a revolutionary tool for the genetic improvement of farm animals. At the same time, it can increase inbreeding and produce a more rapid depletion of genetic variability of the selected traits in future generations. Optimum-contribution selection (OCS) represents an approach to maximize genetic gain while constraining inbreeding within a targeted range. In the present study, 515 Ningxiang pigs were genotyped with the Illumina Porcine SNP60 array or the GeneSeek Genomic Profiler Porcine 50K array. The Ningxiang pigs were found to be highly inbred at the genomic level. Average locus-wise inbreeding coefficients were 0.41 and 0.37 for the two SNP arrays used, whereas genomic inbreeding coefficients based on runs of homozygosity were 0.24 and 0.25, respectively. Simulated phenotypic data were used to assess the utility of genomic OCS (GOCS) in comparison with GS without inbreeding control. GOCS was conducted under two scenarios, selecting sires only (GOCS_S) or selecting sires and dams (GOCS_SD), while kinships were constrained on selected parents. The genetic gain for average daily body weight gain (ADG) per generation was between 18.99 and 20.55 g with GOCS_S, and between 23.20 and 28.92 with GOCS_SD, and it varied from 25.38 to 48.38 g under GS without controlling inbreeding. While the rate of genetic gain per generation obtained using GS was substantially larger than that obtained by the two scenarios of genomic OCS in the beginning generations of selection, the difference in the genetic gain of ADG between GS and GOCS reduced quickly in latter generations. At generation ten, the difference in the realized rates of genetic gain between GS and GOCS_SD diminished and ended up with even a slightly higher genetic gain with GOCS_SD, due to the rapid loss of genetic variance with GS and fixation of causative genes. The rate of inbreeding was mostly maintained below 5% per generation with genomic OCS, whereas it increased to between 10.5% and 15.3% per generation with GS. Therefore, genomic OCS appears to be a sustainable strategy for the genetic improvement of local breeds such as Ningxiang pigs, but keeping mind that a variety of GOCS methods exist and the optimal forms remain to be exploited further.


Subject(s)
Inbreeding , Selection, Genetic , Swine/genetics , Animals , Female , Genomics , Homozygote , Male , Phenotype
16.
Sci Rep ; 10(1): 7751, 2020 05 08.
Article in English | MEDLINE | ID: mdl-32385377

ABSTRACT

Mastitis is one of the most prevalent and costly diseases in dairy cattle. It results in changes in milk composition and quality which are indicators of udder inflammation in absence of clinical signs. We applied structural equation modeling (SEM) - GWAS aiming to explore interrelated dependency relationships among phenotypes related to udder health, including milk yield (MY), somatic cell score (SCS), lactose (%, LACT), pH and non-casein N (NCN, % of total milk N), in a cohort of 1,158 Brown Swiss cows. The phenotypic network inferred via the Hill-Climbing algorithm was used to estimate SEM parameters. Integration of multi-trait models-GWAS and SEM-GWAS identified six significant SNPs for SCS, and quantified the contribution of MY and LACT acting as mediator traits to total SNP effects. Functional analyses revealed that overrepresented pathways were often shared among traits and were consistent with biological knowledge (e.g., membrane transport activity for pH and MY or Wnt signaling for SCS and NCN). In summary, SEM-GWAS offered new insights on the relationships among udder health phenotypes and on the path of SNP effects, providing useful information for genetic improvement and management strategies in dairy cattle.


Subject(s)
Health , Mammary Glands, Animal/metabolism , Models, Genetic , Animals , Cattle , Female , Hydrogen-Ion Concentration , Lactose/metabolism , Milk/metabolism , Polymorphism, Single Nucleotide
17.
ACS Nano ; 14(7): 8383-8391, 2020 Jul 28.
Article in English | MEDLINE | ID: mdl-32348120

ABSTRACT

Advances in three-dimensional nanofabrication techniques have enabled the development of lightweight solids, such as hollow nanolattices, having record values of specific stiffness and strength, albeit at low production throughput. At the length scales of the structural elements of these solids-which are often tens of nanometers or smaller-forces required for elastic deformation can be comparable to adhesive forces, rendering the possibility to tailor bulk mechanical properties based on the relative balance of these forces. Herein, we study this interplay via the mechanics of ultralight ceramic-coated carbon nanotube (CNT) structures. We show that ceramic-CNT foams surpass other architected nanomaterials in density-normalized strength and that, when the structures are designed to minimize internal adhesive interactions between CNTs, more than 97% of the strain after compression beyond densification is recovered. Via experiments and modeling, we study the dependence of the recovery and dissipation on the coating thickness, demonstrate that internal adhesive contacts impede recovery, and identify design guidelines for ultralight materials to have maximum recovery. The combination of high recovery and dissipation in ceramic-CNT foams may be useful in structural damping and shock absorption, and the general principles could be broadly applied to both architected and stochastic nanofoams.

18.
Heredity (Edinb) ; 124(5): 658-674, 2020 05.
Article in English | MEDLINE | ID: mdl-32127659

ABSTRACT

This study evaluated the use of multiomics data for classification accuracy of rheumatoid arthritis (RA). Three approaches were used and compared in terms of prediction accuracy: (1) whole-genome prediction (WGP) using SNP marker information only, (2) whole-methylome prediction (WMP) using methylation profiles only, and (3) whole-genome/methylome prediction (WGMP) with combining both omics layers. The number of SNP and of methylation sites varied in each scenario, with either 1, 10, or 50% of these preselected based on four approaches: randomly, evenly spaced, lowest p value (genome-wide association or epigenome-wide association study), and estimated effect size using a Bayesian ridge regression (BRR) model. To remove effects of high levels of pairwise linkage disequilibrium (LD), SNPs were also preselected with an LD-pruning method. Five Bayesian regression models were studied for classification, including BRR, Bayes-A, Bayes-B, Bayes-C, and the Bayesian LASSO. Adjusting methylation profiles for cellular heterogeneity within whole blood samples had a detrimental effect on the classification ability of the models. Overall, WGMP using Bayes-B model has the best performance. In particular, selecting SNPs based on LD-pruning with 1% of the methylation sites selected based on BRR included in the model, and fitting the most significant SNP as a fixed effect was the best method for predicting disease risk with a classification accuracy of 0.975. Our results showed that multiomics data can be used to effectively predict the risk of RA and identify cases in early stages to prevent or alter disease progression via appropriate interventions.


Subject(s)
Arthritis, Rheumatoid , DNA Methylation , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Arthritis, Rheumatoid/genetics , Bayes Theorem , Humans
19.
Genet Sel Evol ; 52(1): 12, 2020 Feb 24.
Article in English | MEDLINE | ID: mdl-32093611

ABSTRACT

BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.


Subject(s)
Cattle/genetics , Deep Learning , Genomics , Animals , Bayes Theorem , Genotype , Models, Genetic , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable
20.
Front Genet ; 11: 567818, 2020.
Article in English | MEDLINE | ID: mdl-33391339

ABSTRACT

This research assessed the ability of a Support Vector Machine (SVM) regression model to predict pig crossbred (CB) performance from various sources of phenotypic and genotypic information for improving crossbreeding performance at reduced genotyping cost. Data consisted of average daily gain (ADG) and residual feed intake (RFI) records and genotypes of 5,708 purebred (PB) boars and 5,007 CB pigs. Prediction models were fitted using individual PB genotypes and phenotypes (trn.1); genotypes of PB sires and average of CB records per PB sire (trn.2); and individual CB genotypes and phenotypes (trn.3). The average of CB offspring records was the trait to be predicted from PB sire's genotype using cross-validation. Single nucleotide polymorphisms (SNPs) were ranked based on the Spearman Rank correlation with the trait. Subsets with an increasing number (from 50 to 2,000) of the most informative SNPs were used as predictor variables in SVM. Prediction performance was the median of the Spearman correlation (SC, interquartile range in brackets) between observed and predicted phenotypes in the testing set. The best predictive performances were obtained when sire phenotypic information was included in trn.1 (0.22 [0.03] for RFI with SVM and 250 SNPs, and 0.12 [0.05] for ADG with SVM and 500-1,000 SNPs) or when trn.3 was used (0.29 [0.16] with Genomic best linear unbiased prediction (GBLUP) for RFI, and 0.15 [0.09] for ADG with just 50 SNPs). Animals from the last two generations were assigned to the testing set and remaining animals to the training set. Individual's PB own phenotype and genotype improved the prediction ability of CB offspring of young animals for ADG but not for RFI. The highest SC was 0.34 [0.21] and 0.36 [0.22] for RFI and ADG, respectively, with SVM and 50 SNPs. Predictive performance using CB data for training leads to a SC of 0.34 [0.19] with GBLUP and 0.28 [0.18] with SVM and 250 SNPs for RFI and 0.34 [0.15] with SVM and 500 SNPs for ADG. Results suggest that PB candidates could be evaluated for CB performance with SVM and low-density SNP chip panels after collecting their own RFI or ADG performances or even earlier, after being genotyped using a reference population of CB animals.

SELECTION OF CITATIONS
SEARCH DETAIL
...