Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Front Genet ; 15: 1380643, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38894723

RESUMO

Background: To address the limitations of commonly used cross-validation methods, the linear regression method (LR) was proposed to estimate population accuracy of predictions based on the implicit assumption that the fitted model is correct. This method also provides two statistics to determine the adequacy of the fitted model. The validity and behavior of the LR method have been provided and studied for linear predictions but not for nonlinear predictions. The objectives of this study were to 1) provide a mathematical proof for the validity of the LR method when predictions are based on conditional means, regardless of whether the predictions are linear or non-linear 2) investigate the ability of the LR method to detect whether the fitted model is adequate or inadequate, and 3) provide guidelines on how to appropriately partition the data into training and validation such that the LR method can identify an inadequate model. Results: We present a mathematical proof for the validity of the LR method to estimate population accuracy and to determine whether the fitted model is adequate or inadequate when the predictor is the conditional mean, which may be a non-linear function of the phenotype. Using three partitioning scenarios of simulated data, we show that the one of the LR statistics can detect an inadequate model only when the data are partitioned such that the values of relevant predictor variables differ between the training and validation sets. In contrast, we observed that the other LR statistic was able to detect an inadequate model for all three scenarios. Conclusion: The LR method has been proposed to address some limitations of the traditional approach of cross-validation in genetic evaluation. In this paper, we showed that the LR method is valid when the model is adequate and the conditional mean is the predictor, even when it is a non-linear function of the phenotype. We found one of the two LR statistics is superior because it was able to detect an inadequate model for all three partitioning scenarios (i.e., between animals, by age within animals, and between animals and by age) that were studied.

2.
Genet Sel Evol ; 54(1): 78, 2022 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-36460973

RESUMO

BACKGROUND: Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success. RESULTS: Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly. CONCLUSIONS: The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.


Assuntos
Genômica , Animais , Teorema de Bayes , Estudos Transversais , Funções Verossimilhança , Fenótipo
3.
Genet Sel Evol ; 54(1): 72, 2022 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-36316629

RESUMO

BACKGROUND: Single-step genomic best linear unbiased prediction (GBLUP) involves a joint analysis of individuals with genotype information, and their ancestors, descendants, or contemporaries, without recorded genotypes. Livestock applications typically represent populations with fewer individuals with genotypes relative to the number not genotyped. Most breeding programmes are structured, consisting of a nucleus tier in which selection drives genetic gains that are propagated through descendants that represent parents in multiplier and commercial tiers. In some cases, the genotypes in the nucleus tier are proprietary to a breeding company, and not publicly available for a whole industry analysis. Bayesian inference involves combining a defined description of prior information with new information to generate a posterior distribution that contains all available information on parameters of interest. A natural extension of Bayesian analysis would be to use information from the posterior distribution to define the prior distribution in a subsequent analysis. METHODS: We derive the mixed model equations for inference on breeding values for non genotyped individuals in that subset of the population that is of current interest, using only data on the performance of current individuals and their immediate pedigree, along with prior information defined from the posterior distribution of an external BLUP or single-step GBLUP analysis of the ancestors of the current population. DISCUSSION: Identical estimates of breeding values and their prediction error covariances for current animals of interest in the multiplier or commercial tier can be obtained without requiring neither the genomic relationship matrix nor genotypes of any of their ancestors in the nucleus tier, as can be obtained from a single analysis using pedigree, performance, and genomic information from all tiers. The Bayesian analysis of the current population does not require explicit information on unselected genotyped animals in the external population.


Assuntos
Genoma , Genômica , Animais , Teorema de Bayes , Genótipo , Genômica/métodos , Linhagem , Modelos Genéticos , Fenótipo
4.
Genet Sel Evol ; 54(1): 12, 2022 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-35135468

RESUMO

BACKGROUND: Linkage disequilibrium (LD) is commonly measured based on the squared coefficient of correlation [Formula: see text] between the alleles at two loci that are carried by haplotypes. LD can also be estimated as the [Formula: see text] between unphased genotype dosage at two loci when the allele frequencies and inbreeding coefficients at both loci are identical for the parental lines. Here, we investigated whether [Formula: see text] for a crossbred population (F1) can be estimated using genotype data. The parental lines of the crossbred (F1) can be purebred or crossbred. METHODS: We approached this by first showing that inbreeding coefficients for an F1 crossbred population are negative, and typically differ in size between loci. Then, we proved that the expected [Formula: see text] computed from unphased genotype data is expected to be identical to the [Formula: see text] computed from haplotype data for an F1 crossbred population, regardless of the inbreeding coefficients at the two loci. Finally, we investigated the bias and precision of the [Formula: see text] estimated using unphased genotype versus haplotype data in stochastic simulation. RESULTS: Our findings show that estimates of [Formula: see text] based on haplotype and unphased genotype data are both unbiased for different combinations of allele frequencies, sample sizes (900, 1800, and 2700), and levels of LD. In general, for any allele frequency combination and [Formula: see text] value scenarios considered, and for both methods to estimate [Formula: see text], the precision of the estimates increased, and the bias of the estimates decreased as sample size increased, indicating that both estimators are consistent. For a given scenario, the [Formula: see text] estimates using haplotype data were more precise and less biased using haplotype data than using unphased genotype data. As sample size increased, the difference in precision and biasedness between the [Formula: see text] estimates using haplotype data and unphased genotype data decreased. CONCLUSIONS: Our theoretical derivations showed that estimates of LD between loci based on unphased genotypes and haplotypes in F1 crossbreds have identical expectations. Based on our simulation results, we conclude that the LD for an F1 crossbred population can be accurately estimated from unphased genotype data. The results also apply for other crosses (F2, F3, Fn, BC1, BC2, and BCn), as long as (selected) individuals from the two parental lines mate randomly.


Assuntos
Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Frequência do Gene , Genótipo , Haplótipos , Humanos , Desequilíbrio de Ligação
5.
Genet Sel Evol ; 53(1): 91, 2021 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-34875996

RESUMO

BACKGROUND: The possibility of using antibody response (S/P ratio) to PRRSV vaccination measured in crossbred commercial gilts as a genetic indicator for reproductive performance in vaccinated crossbred sows has motivated further studies of the genomic basis of this trait. In this study, we investigated the association of haplotypes and runs of homozygosity (ROH) and heterozygosity (ROHet) with S/P ratio and their impact on reproductive performance. RESULTS: There was no association (P-value ≥ 0.18) of S/P ratio with the percentage of ROH or ROHet, or with the percentage of heterozygosity across the whole genome or in the major histocompatibility complex (MHC) region. However, specific ROH and ROHet regions were significantly associated (P-value ≤ 0.01) with S/P ratio on chromosomes 1, 4, 5, 7, 10, 11, 13, and 17 but not (P-value ≥ 0.10) with reproductive performance. With the haplotype-based genome-wide association study (GWAS), additional genomic regions associated with S/P ratio were identified on chromosomes 4, 7, and 9. These regions harbor immune-related genes, such as SLA-DOB, TAP2, TAPBP, TMIGD3, and ADORA. Four haplotypes at the identified region on chromosome 7 were also associated with multiple reproductive traits. A haplotype significantly associated with S/P ratio that is located in the MHC region may be in stronger linkage disequilibrium (LD) with the quantitative trait loci (QTL) than the previously identified single nucleotide polymorphism (SNP) (H3GA0020505) given the larger estimate of genetic variance explained by the haplotype than by the SNP. CONCLUSIONS: Specific ROH and ROHet regions were significantly associated with S/P ratio. The haplotype-based GWAS identified novel QTL for S/P ratio on chromosomes 4, 7, and 9 and confirmed the presence of at least one QTL in the MHC region. The chromosome 7 region was also associated with reproductive performance. These results narrow the search for causal genes in this region and suggest SLA-DOB and TAP2 as potential candidate genes associated with S/P ratio on chromosome 7. These results provide additional opportunities for marker-assisted selection and genomic selection for S/P ratio as genetic indicator for litter size in commercial pig populations.


Assuntos
Vírus da Síndrome Respiratória e Reprodutiva Suína , Animais , Formação de Anticorpos , Feminino , Estudo de Associação Genômica Ampla , Genômica , Haplótipos , Locos de Características Quantitativas , Sus scrofa/genética , Suínos/genética , Vacinação
6.
J Anim Sci ; 99(5)2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-33782709

RESUMO

Antibody response, measured as sample-to-positive (S/P) ratio, to porcine reproductive and respiratory syndrome virus (PRRSV) following a PRRSV-outbreak (S/POutbreak) in a purebred nucleus and following a PRRSV-vaccination (S/PVx) in commercial crossbred herds have been proposed as genetic indicator traits for improved reproductive performance in PRRSV-infected purebred and PRRSV-vaccinated crossbred sows, respectively. In this study, we investigated the genetic relationships of S/POutbreak and S/PVx with performance at the commercial (vaccinated crossbred sows) and nucleus level (non-infected and PRRSV-infected purebred sows), respectively, and tested the effect of previously identified SNP for these indicator traits. Antibody response was measured on 541 Landrace sows ~54 d after the start of a PRRSV outbreak, and on 906 F1 (Landrace × Large White) gilts ~50 d after vaccination with a commercial PRRSV vaccine. Reproductive performance was recorded for 711 and 428 Landrace sows before and during the PRRSV outbreak, respectively, and for 811 vaccinated F1 animals. The estimate of the genetic correlation (rg) of S/POutbreak with S/PVx was 0.72 ± 0.18. The estimates of rg of S/POutbreak with reproductive performance in vaccinated crossbred sows were low to moderate, ranging from 0.05 ± 0.23 to 0.30 ± 0.20. The estimate of rg of S/PVx with reproductive performance in non-infected purebred sows was moderate and favorable with number born alive (0.50 ± 0.23) but low (0 ± 0.23 to -0.11 ± 0.23) with piglet mortality traits. The estimates of rg of S/PVx were moderate and negative (-0.38 ± 0.21) with number of mummies in PRRSV-infected purebred sows and low with other traits (-0.30 ± 0.18 to 0.05 ± 0.18). Several significant associations (P0 > 0.90) of previously reported SNP for S/P ratio (ASGA0032063 and H3GA0020505) were identified for S/P ratio and performance in non-infected purebred and PRRSV-exposed purebred and crossbred sows. Genomic regions harboring the major histocompatibility complex class II region significantly contributed to the genetic correlation of antibody response to PRRSV with most of the traits analyzed. These results indicate that selection for antibody response in purebred sows following a PRRSV outbreak in the nucleus and for antibody response to PRRSV vaccination measured in commercial crossbred sows are expected to increase litter size in purebred and commercial sows.


Assuntos
Síndrome Respiratória e Reprodutiva Suína , Vírus da Síndrome Respiratória e Reprodutiva Suína , Doenças dos Suínos , Vacinas Virais , Animais , Formação de Anticorpos , Feminino , Genômica , Síndrome Respiratória e Reprodutiva Suína/genética , Síndrome Respiratória e Reprodutiva Suína/prevenção & controle , Gravidez , Suínos , Vacinação/veterinária
7.
J Anim Breed Genet ; 138(5): 519-527, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33729622

RESUMO

Empirical estimates of the accuracy of estimates of breeding values (EBV) can be obtained by cross-validation. Leave-one-out cross-validation (LOOCV) is an extreme case of k-fold cross-validation. Efficient strategies for LOOCV of predictions of phenotypes have been developed for a simple model with an overall mean and random marker or animal genetic effects. The objective here was to develop and evaluate an efficient LOOCV method for prediction of breeding values and other random effects under a general mixed linear model with multiple random effects. Conventional LOOCV of EBV requires inverting an (n-1)×(n-1) covariance matrix for each of n (= number of observations) data sets. Our efficient LOOCV obtains the required inverses from the inverse of the covariance matrix for all n observations. The efficient method can be applied to complex models with multiple fixed and random effects, but requires fixed effects to be treated as random, with large variances. An alternative is to precorrect observations using estimates of fixed effects obtained from the complete data, but this can lead to biases. The efficient LOOCV method was compared to conventional LOOCV of predictions of breeding values in terms of computational demands and accuracy. For a data set with 3,205 observations and a model with multiple random and fixed effects, the efficient LOOCV method was 962 times faster than the conventional LOOCV with precorrection for fixed effects based on each training data set but resulted in identical EBV. A computationally efficient LOOCV for prediction of breeding values for single- and multiple-trait mixed models with multiple fixed and random effects was successfully developed. The method enables cross-validation of predictions of breeding values and of any linear combination of random and/or fixed effects, along with leave-one-out precorrection of validation phenotypes.


Assuntos
Cruzamento , Modelos Genéticos , Animais , Genótipo , Modelos Lineares , Fenótipo
8.
Front Genet ; 11: 1011, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33024439

RESUMO

We proposed to investigate the genomic basis of antibody response to porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) vaccination and its relationship to reproductive performance in non-PRRSV-infected commercial sows. Nine hundred and six F1 replacement gilts (139 ± 17 days old) from two commercial farms were vaccinated with a commercial modified live PRRSV vaccine. Blood samples were collected about 52 days after vaccination to measure antibody response to PRRSV as sample-to-positive (S/P) ratio and for single-nucleotide polymorphism (SNP) genotyping. Reproductive performance was recorded for up to 807 sows for number born alive (NBA), number of piglets weaned, number born mummified (MUM), number of stillborn (NSB), and number of pre-weaning mortality (PWM) at parities (P) 1-3 and per sow per year (PSY). Fertility traits such as farrowing rate and age at first service were also analyzed. BayesC0 was used to estimate heritability and genetic correlations of S/P ratio with reproductive performance. Genome-wide association study (GWAS) and genomic prediction were performed using BayesB. The heritability estimate of S/P ratio was 0.34 ± 0.05. High genetic correlations (r g) of S/P ratio with farrowing performance were identified for NBA P1 (0.61), PWM P2 (-0.70), NSB P3 (-0.83), MUM P3 (-0.84), and NSB PSY (-0.90), indicating that genetic selection for increased S/P ratio would result in improved performance of these traits. A quantitative trait locus was identified on chromosome 7 (∼25 Mb), at the major histocompatibility complex (MHC) region, explaining ∼30% of the genetic variance for S/P ratio, mainly by SNPs ASGA0032113, H3GA0020505, and M1GA0009777. This same region was identified in the bivariate GWAS of S/P ratio and reproductive traits, with SNP H3GA0020505 explaining up to 10% (for NBA P1) of the genetic variance of reproductive performance. The heterozygote genotype at H3GA0020505 was associated with greater S/P ratio and NBA P1 (P = 0.06), and lower MUM P3 and NSB P3 (P = 0.07). Genomic prediction accuracy for S/P ratio was high when using all SNPs (0.67) and when using only those in the MHC region (0.59) and moderate to low when using all SNPs excluding those in the MHC region (0.39). These results suggest that there is great potential to use antibody response to PRRSV vaccination as an indicator trait to improve reproductive performance in commercial pigs.

10.
Front Genet ; 11: 362, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32425975

RESUMO

Linkage disequilibrium (LD), often expressed in terms of the squared correlation (r 2) between allelic values at two loci, is an important concept in many branches of genetics and genomics. Genetic drift and recombination have opposite effects on LD, and thus r 2 will keep changing until the effects of these two forces are counterbalanced. Several approximations have been used to determine the expected value of r 2 at equilibrium in the presence or absence of mutation. In this paper, we propose a probability-based approach to compute the exact distribution of allele frequencies at two loci in a finite population at any generation t conditional on the distribution at generation t - 1. As r 2 is a function of this distribution of allele frequencies, this approach can be used to examine the distribution of r 2 over generations as it approaches equilibrium. The exact distribution of LD from our method is used to describe, quantify, and compare LD at different equilibria, including equilibrium in the absence or presence of mutation, selection, and filtering by minor allele frequency. We also propose a deterministic formula for expected LD in the presence of mutation at equilibrium based on the exact distribution of LD.

11.
Genetics ; 214(2): 305-331, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31879318

RESUMO

A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Herança Multifatorial/genética , Teorema de Bayes , Genótipo , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Característica Quantitativa Herdável , Seleção Genética/genética , Triticum/genética
12.
Theor Popul Biol ; 132: 47-59, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31830483

RESUMO

Modeling covariance structure based on genetic similarity between pairs of relatives plays an important role in evolutionary, quantitative and statistical genetics. Historically, genetic similarity between individuals has been quantified from pedigrees via the probability that randomly chosen homologous alleles between individuals are identical by descent (IBD). At present, however, many genetic analyses rely on molecular markers, with realized measures of genomic similarity replacing IBD-based expected similarities. Animal and plant breeders, for example, now employ marker-based genomic relationship matrices between individuals in prediction models and in estimation of genome-based heritability coefficients. Phenotypes convey information about genetic similarity as well. For instance, if phenotypic values are at least partially the result of the action of quantitative trait loci, one would expect the former to inform about the latter, as in genome-wide association studies. Statistically, a non-trivial conditional distribution of unknown genetic similarities, given phenotypes, is to be expected. A Bayesian formalism is presented here that applies to whole-genome regression methods where some genetic similarity matrix, e.g., a genomic relationship matrix, can be defined. Our Bayesian approach, based on phenotypes and markers, converts prior (markers only) expected similarity into trait-specific posterior similarity. A simulation illustrates situations under which effective Bayesian learning from phenotypes occurs. Pinus and wheat data sets were used to demonstrate applicability of the concept in practice. The methodology applies to a wide class of Bayesian linear regression models, it extends to the multiple-trait domain, and can also be used to develop phenotype-guided similarity kernels in prediction problems.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Locos de Características Quantitativas , Teorema de Bayes , Genótipo , Fenótipo , Pinus/genética , Polimorfismo de Nucleotídeo Único , Triticum/genética
13.
J Dairy Sci ; 102(11): 10039-10055, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31477308

RESUMO

Vitamin A is essential for human health, but current intake levels in many developing countries such as India are too low due to malnutrition. According to the World Health Organization, an estimated 250 million preschool children are vitamin A deficient globally. This number excludes pregnant women and nursing mothers, who are particularly vulnerable. Efforts to improve access to vitamin A are key because supplementation can reduce mortality rates in young children in developing countries by around 23%. Three key genes, BCMO1, BCO2, and SCARB1, have been shown to be associated with the amount of ß-carotene (BC) in milk. Whole-genome sequencing reads from the coordinates of these 3 genes in 202 non-Indian cattle (141 Bos taurus, 61 Bos indicus) and 35 non-Indian buffalo (Bubalus bubalis) animals from several breeds were collected from data repositories. The number of SNP detected in the coding regions of these 3 genes ranged from 16 to 26 in the 3 species, with 5 overlapping SNP between B. taurus and B. indicus. All these SNP together with 2 SNP in the upstream part of the gene but already present in dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/) were used to build a custom Sequenom array. Blood for DNA and milk samples for BC were obtained from 2,291 Indian cows of 5 different breeds (Gir, Holstein cross, Jersey Cross, Tharparkar, and Sahiwal) and 2,242 Indian buffaloes (Jafarabadi, Murrah, Pandharpuri, and Surti breeds). The DNA was extracted and genotyped with the Sequenom array. For each individual breed and the combined breeds, SNP with an association that had a P-value <0.3 in the first round of linear analysis were included in a second step of regression analyses to determine allele substitution effects to increase the content of BC in milk. Additionally, an F-test for all SNP within gene was performed with the objective of determining if overall the gene had a significant effect on the content of BC in milk. The analyses were repeated using a Bayesian approach to compare and validate the previous frequentist results. Multiple significant SNP were found using both methodologies with allele substitution effects ranging from 6.21 (3.13) to 9.10 (5.43) µg of BC per 100 mL of milk. Total gene effects exceeded the mean BC value for all breeds with both analysis approaches. The custom panel designed for genes related to BC production demonstrated applicability in genotyping of cattle and buffalo in India and may be used for cattle or buffalo from other developing countries. Moreover, the recommendation of selection for significant specific alleles of some gene markers provides a route to effectively increase the BC content in milk in the Indian cattle and buffalo populations.


Assuntos
Búfalos/genética , Bovinos/genética , Marcadores Genéticos , Leite/química , beta Caroteno/análise , Alelos , Animais , Feminino , Genótipo , Índia , Polimorfismo de Nucleotídeo Único , Gravidez , Especificidade da Espécie , beta Caroteno/genética
14.
J Anim Sci Biotechnol ; 10: 20, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30891237

RESUMO

BACKGROUND: The frequency of recombination events varies across the genome and between individuals, which may be related to some genomic features. The objective of this study was to assess the frequency of recombination events and to identify QTL (quantitative trait loci) for recombination rate in two purebred layer chicken lines. METHODS: A total of 1200 white-egg layers (WL) were genotyped with 580 K SNPs and 5108 brown-egg layers (BL) were genotyped with 42 K SNPs (single nucleotide polymorphisms). Recombination events were identified within half-sib families and both the number of recombination events and the recombination rate was calculated within each 0.5 Mb window of the genome. The 10% of windows with the highest recombination rate on each chromosome were considered to be recombination hotspots. A BayesB model was used separately for each line to identify genomic regions associated with the genome-wide number of recombination event per meiosis. Regions that explained more than 0.8% of genetic variance of recombination rate were considered to harbor QTL. RESULTS: Heritability of recombination rate was estimated at 0.17 in WL and 0.16 in BL. On average, 11.3 and 23.2 recombination events were detected per individual across the genome in 1301 and 9292 meioses in the WL and BL, respectively. The estimated recombination rates differed significantly between the lines, which could be due to differences in inbreeding levels, and haplotype structures. Dams had about 5% to 20% higher recombination rates per meiosis than sires in both lines. Recombination rate per 0.5 Mb window had a strong negative correlation with chromosome size and a strong positive correlation with GC content and with CpG island density across the genome in both lines. Different QTL for recombination rate were identified in the two lines. There were 190 and 199 non-overlapping recombination hotspots detected in WL and BL respectively, 28 of which were common to both lines. CONCLUSIONS: Differences in the recombination rates, hotspot locations, and QTL regions associated with genome-wide recombination were observed between lines, indicating the breed-specific feature of detected recombination events and the control of recombination events is a complex polygenic trait.

15.
J Anim Breed Genet ; 136(2): 113-117, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30614572

RESUMO

A curious result from mixed linear models applied to genome-wide association studies was expanded. In particular, a model in which one or more markers are considered as fixed but are allowed to contribute to the covariance structure by treating such markers as random as well was examined. The best linear unbiased estimator of marker effects is invariant with respect to whether those markers are employed in constructing a genomic relationship matrix or are ignored, provided marker effects are uncorrelated with those not being tested. Also, the implications of regarding some marker effects as fixed when, in fact, these possess a non-trivial covariance structure with those declared as random were examined.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Lineares , Modelos Genéticos , Modelos Estatísticos , Animais , Cruzamento , Genoma/genética , Genômica , Polimorfismo de Nucleotídeo Único
16.
G3 (Bethesda) ; 8(11): 3567-3575, 2018 11 06.
Artigo em Inglês | MEDLINE | ID: mdl-30213868

RESUMO

Advances in next generation sequencing technologies and statistical approaches enable genome-wide dissection of phenotypic traits via genome-wide association studies (GWAS). Although multiple statistical approaches for conducting GWAS are available, the power and cross-validation rates of many approaches have been mostly tested using simulated data. Empirical comparisons of single variant (SV) and multi-variant (MV) GWAS approaches have not been conducted to test if a single approach or a combination of SV and MV is effective, through identification and cross-validation of trait-associated loci. In this study, kernel row number (KRN) data were collected from a set of 6,230 entries derived from the Nested Association Mapping (NAM) population and related populations. Three different types of GWAS analyses were performed: 1) single-variant (SV), 2) stepwise regression (STR) and 3) a Bayesian-based multi-variant (BMV) model. Using SV, STR, and BMV models, 257, 300, and 442 KRN-associated variants (KAVs) were identified in the initial GWAS analyses. Of these, 231 KAVs were subjected to genetic validation using three unrelated populations that were not included in the initial GWAS. Genetic validation results suggest that the three GWAS approaches are complementary. Interestingly, KAVs in low recombination regions were more likely to exhibit associations in independent populations than KAVs in recombinationally active regions, probably as a consequence of linkage disequilibrium. The KAVs identified in this study have the potential to enhance our understanding of the genetic basis of ear development.


Assuntos
Modelos Estatísticos , Zea mays/genética , Estudo de Associação Genômica Ampla , Fenótipo
17.
Genet Sel Evol ; 50(1): 32, 2018 06 19.
Artigo em Inglês | MEDLINE | ID: mdl-29914353

RESUMO

BACKGROUND: Population stratification and cryptic relationships have been the main sources of excessive false-positives and false-negatives in population-based association studies. Many methods have been developed to model these confounding factors and minimize their impact on the results of genome-wide association studies. In most of these methods, a two-stage approach is applied where: (1) methods are used to determine if there is a population structure in the sample dataset and (2) the effects of population structure are corrected either by modeling it or by running a separate analysis within each sub-population. The objective of this study was to evaluate the impact of population structure on the accuracy and power of genome-wide association studies using a Bayesian multiple regression method. METHODS: We conducted a genome-wide association study in a stochastically simulated admixed population. The genome was composed of six chromosomes, each with 1000 markers. Fifteen segregating quantitative trait loci contributed to the genetic variation of a quantitative trait with heritability of 0.30. The impact of genetic relationships and breed composition (BC) on three analysis methods were evaluated: single marker simple regression (SMR), single marker mixed linear model (MLM) and Bayesian multiple-regression analysis (BMR). Each method was fitted with and without BC. Accuracy, power, false-positive rate and the positive predictive value of each method were calculated and used for comparison. RESULTS: SMR and BMR, both without BC, were ranked as the worst and the best performing approaches, respectively. Our results showed that, while explicit modeling of genetic relationships and BC is essential for models SMR and MLM, BMR can disregard them and yet result in a higher power without compromising its false-positive rate. CONCLUSIONS: This study showed that the Bayesian multiple-regression analysis is robust to population structure and to relationships among study subjects and performs better than a single marker mixed linear model approach.


Assuntos
Mapeamento Cromossômico/veterinária , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Característica Quantitativa Herdável , Animais , Teorema de Bayes , Cruzamento , Genética Populacional , Tamanho do Genoma , Modelos Lineares , Modelos Genéticos , Densidade Demográfica
18.
G3 (Bethesda) ; 7(8): 2685-2694, 2017 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-28642364

RESUMO

In single-step analyses, missing genotypes are explicitly or implicitly imputed, and this requires centering the observed genotypes using the means of the unselected founders. If genotypes are only available for selected individuals, centering on the unselected founder mean is not straightforward. Here, computer simulation is used to study an alternative analysis that does not require centering genotypes but fits the mean [Formula: see text] of unselected individuals as a fixed effect. Starting with observed diplotypes from 721 cattle, a five-generation population was simulated with sire selection to produce 40,000 individuals with phenotypes, of which the 1000 sires had genotypes. The next generation of 8000 genotyped individuals was used for validation. Evaluations were undertaken with (J) or without (N) [Formula: see text] when marker covariates were not centered; and with (JC) or without (C) [Formula: see text] when all observed and imputed marker covariates were centered. Centering did not influence accuracy of genomic prediction, but fitting [Formula: see text] did. Accuracies were improved when the panel comprised only quantitative trait loci (QTL); models JC and J had accuracies of 99.4%, whereas models C and N had accuracies of 90.2%. When only markers were in the panel, the 4 models had accuracies of 80.4%. In panels that included QTL, fitting [Formula: see text] in the model improved accuracy, but had little impact when the panel contained only markers. In populations undergoing selection, fitting [Formula: see text] in the model is recommended to avoid bias and reduction in prediction accuracy due to selection.


Assuntos
Genética Populacional , Genômica , Modelos Genéticos , Seleção Genética , Animais , Teorema de Bayes , Cruzamento , Bovinos , Feminino , Marcadores Genéticos , Genótipo , Padrões de Herança/genética , Masculino , Locos de Características Quantitativas/genética , Análise de Regressão
19.
Artigo em Inglês | MEDLINE | ID: mdl-28469846

RESUMO

BACKGROUND: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. METHODS: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. RESULTS: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. CONCLUSIONS: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.

20.
Genet Sel Evol ; 48(1): 96, 2016 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-27931187

RESUMO

BACKGROUND: Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci. THEORY: We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped. APPLICATION: We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM. CONCLUSIONS: We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.


Assuntos
Teorema de Bayes , Biologia Computacional , Modelos Genéticos , Análise de Regressão , Algoritmos , Animais , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...