Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Front Genet ; 14: 1220408, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37662837

RESUMO

In the last decade, a number of methods have been suggested to deal with large amounts of genetic data in genomic predictions. Yet, steadily growing population sizes and the suboptimal use of computational resources are pushing the practical application of these approaches to their limits. As an extension to the C/CUDA library miraculix, we have developed tailored solutions for the computation of genotype matrix multiplications which is a critical bottleneck in the empirical evaluation of many statistical models. We demonstrate the benefits of our solutions at the example of single-step models which make repeated use of this kind of multiplication. Targeting modern Nvidia® GPUs as well as a broad range of CPU architectures, our implementation significantly reduces the time required for the estimation of breeding values in large population sizes. miraculix is released under the Apache 2.0 license and is freely available at https://github.com/alexfreudenberg/miraculix.

2.
BMC Genomics ; 21(1): 308, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299342

RESUMO

BACKGROUND: Göttingen Minipigs (GMP) is the smallest commercially available minipig breed under a controlled breeding scheme and is globally bred in five isolated colonies. The genetic isolation harbors the risk of stratification which might compromise the identity of the breed and its usability as an animal model for biomedical and human disease. We conducted whole genome re-sequencing of two DNA-pools per colony to assess genomic differentiation within and between colonies. We added publicly available samples from 13 various pig breeds and discovered overall about 32 M loci, ~ 16 M. thereof variable in GMPs. Individual samples were virtually pooled breed-wise. FST between virtual and DNA pools, a phylogenetic tree, principal component analysis (PCA) and evaluation of functional SNP classes were conducted. An F-test was performed to reveal significantly differentiated allele frequencies between colonies. Variation within a colony was quantified as expected heterozygosity. RESULTS: Phylogeny and PCA showed that the GMP is easily discriminable from all other breads, but that there is also differentiation between the GMP colonies. Dependent on the contrast between GMP colonies, 4 to 8% of all loci had significantly different allele frequencies. Functional annotation revealed that functionally non-neutral loci are less prone to differentiation. Annotation of highly differentiated loci revealed a couple of deleterious mutations in genes with putative effects in the GMPs . CONCLUSION: Differentiation and annotation results suggest that the underlying mechanisms are rather drift events than directed selection and limited to neutral genome regions. Animal exchange seems not yet necessary. The Relliehausen colony appears to be the genetically most unique GMP sub-population and could be a valuable resource if animal exchange is required to maintain uniformity of the GMP.


Assuntos
Cruzamento , Polimorfismo de Nucleotídeo Único , Porco Miniatura/classificação , Porco Miniatura/genética , Animais , Frequência do Gene , Filogenia , Locos de Características Quantitativas , Análise de Sequência de DNA , Suínos , Sequenciamento Completo do Genoma
3.
G3 (Bethesda) ; 10(6): 1915-1918, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32229505

RESUMO

The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays' complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.


Assuntos
Cruzamento , Modelos Genéticos , Humanos , Proteínas da Mielina
4.
Genetics ; 213(2): 379-394, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31383770

RESUMO

The additive genomic variance in linear models with random marker effects can be defined as a random variable that is in accordance with classical quantitative genetics theory. Common approaches to estimate the genomic variance in random-effects linear models based on genomic marker data can be regarded as estimating the unconditional (or prior) expectation of this random additive genomic variance, and result in a negligence of the contribution of linkage disequilibrium (LD). We introduce a novel best prediction (BP) approach for the additive genomic variance in both the current and the base population in the framework of genomic prediction using the genomic best linear unbiased prediction (gBLUP) method. The resulting best predictor is the conditional (or posterior) expectation of the additive genomic variance when using the additional information given by the phenotypic data, and is structurally in accordance with the genomic equivalent of the classical additive genetic variance in random-effects models. In particular, the best predictor includes the contribution of (marker) LD to the additive genomic variance and possibly fully eliminates the missing contribution of LD that is caused by the assumptions of statistical frameworks such as the random-effects model. We derive an empirical best predictor (eBP) and compare its performance with common approaches to estimate the additive genomic variance in random-effects models on commonly used genomic datasets.


Assuntos
Genômica/estatística & dados numéricos , Desequilíbrio de Ligação/genética , Locos de Características Quantitativas/genética , Seleção Genética/genética , Animais , Cruzamento , Modelos Lineares , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
5.
Genetics ; 212(4): 1045-1061, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31152070

RESUMO

The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach ("HaploBlocker") for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.


Assuntos
Biblioteca Gênica , Haplótipos , Algoritmos , Animais , Biologia Computacional , Conjuntos de Dados como Assunto , Ligação Genética , Marcadores Genéticos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Zea mays/genética
6.
Bull Math Biol ; 78(5): 1039-57, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27230608

RESUMO

Different variants of a mathematical model for carrier-mediated signal transduction are introduced with focus on the odor dose-electrophysiological response curve of insect olfaction. The latter offers a unique opportunity to observe experimentally the effect of an alteration in the carrier molecule composition on the signal molecule-dependent response curve. Our work highlights the role of involved carrier molecules, which have largely been ignored in mathematical models for response curves in the past. The resulting model explains how the involvement of more than one carrier molecule in signal molecule transport can cause dose-response curves as observed in experiments, without the need of more than one receptor per neuron. In particular, the model has the following features: (1) An extended sensitivity range of neuronal response is implemented by a system consisting of only one receptor but several carrier molecules with different affinities for the signal molecule. (2) Given that the sensitivity range is extended by the involvement of different carrier molecules, the model implies that a strong difference in the expression levels of the carrier molecules is absolutely essential for wide range responses. (3) Complex changes in dose-response curves which can be observed when the expression levels of carrier molecules are altered experimentally can be explained by interactions between different carrier molecules. The principles we demonstrate here for electrophysiological responses can also be applied to any other carrier-mediated biological signal transduction process. The presented concept provides a framework for modeling and statistical analysis of signal transduction processes if sufficient information on the underlying biology is available.


Assuntos
Modelos Biológicos , Transdução de Sinais/fisiologia , Animais , Insetos/fisiologia , Ligantes , Conceitos Matemáticos , Odorantes , Receptores Odorantes/fisiologia , Olfato/fisiologia
7.
PLoS One ; 10(10): e0141216, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26517830

RESUMO

The understanding of non-random association between loci, termed linkage disequilibrium (LD), plays a central role in genomic research. Since causal mutations are generally not included in genomic marker data, LD between those and available markers is essential for capturing the effects of causal loci on localizing genes responsible for traits. Thus, the interpretation of association studies requires a detailed knowledge of LD patterns. It is well known that most LD measures depend on minor allele frequencies (MAF) of the considered loci and the magnitude of LD is influenced by the physical distances between loci. In the present study, a procedure to compare the LD structure between genomic regions comprising several markers each is suggested. The approach accounts for different scaling factors, namely the distribution of MAF, the distribution of pair-wise differences in MAF, and the physical extent of compared regions, reflected by the distribution of pair-wise physical distances. In the first step, genomic regions are matched based on similarity in these scaling factors. In the second step, chromosome- and genome-wide significance tests for differences in medians of LD measures in each pair are performed. The proposed framework was applied to test the hypothesis that the average LD is different in genic and non-genic regions. This was tested with a genome-wide approach with data sets for humans (Homo sapiens), a highly selected chicken line (Gallus gallus domesticus) and the model plant Arabidopsis thaliana. In all three data sets we found a significantly higher level of LD in genic regions compared to non-genic regions. About 31% more LD was detected genome-wide in genic compared to non-genic regions in Arabidopsis thaliana, followed by 13.6% in human and 6% chicken. Chromosome-wide comparison discovered significant differences on all 5 chromosomes in Arabidopsis thaliana and on one third of the human and of the chicken chromosomes.


Assuntos
Galinhas/genética , Genômica/métodos , Desequilíbrio de Ligação , Animais , Mapeamento Cromossômico , Frequência do Gene , Genoma Humano , Genoma de Planta , Humanos
8.
PLoS One ; 10(7): e0132980, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26167869

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0126880.].

9.
PLoS One ; 10(5): e0126880, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25950439

RESUMO

The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.


Assuntos
Drosophila/genética , Drosophila/fisiologia , Genes de Insetos , Animais
10.
PLoS One ; 10(3): e0122325, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25789767

RESUMO

The metabolic adaptation of dairy cows during the transition period has been studied intensively in the last decades. However, until now, only few studies have paid attention to the genetic aspects of this process. Here, we present the results of a gene-based mapping and pathway analysis with the measurements of three key metabolites, (1) non-esterified fatty acids (NEFA), (2) beta-hydroxybutyrate (BHBA) and (3) glucose, characterizing the metabolic adaptability of dairy cows before and after calving. In contrast to the conventional single-marker approach, we identify 99 significant and biologically sensible genes associated with at least one of the considered phenotypes and thus giving evidence for a genetic basis of the metabolic adaptability. Moreover, our results strongly suggest three pathways involved in the metabolism of steroids and lipids are potential candidates for the adaptive regulation of dairy cows in their early lactation. From our perspective, a closer investigation of our findings will lead to a step forward in understanding the variability in the metabolic adaptability of dairy cows in their early lactation.


Assuntos
Bovinos/genética , Bovinos/metabolismo , Indústria de Laticínios , Metabolômica/métodos , Ácido 3-Hidroxibutírico/metabolismo , Adaptação Fisiológica/genética , Animais , Bovinos/fisiologia , Ácidos Graxos não Esterificados/metabolismo , Feminino , Glucose/metabolismo , Fenótipo
11.
PLoS Curr ; 62014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24818065

RESUMO

The key challenge during food-borne disease outbreaks, e.g. the 2011 EHEC/HUS outbreak in Germany, is the design of efficient mitigation strategies based on a timely identification of the outbreak's spatial origin. Standard public health procedures typically use case-control studies and tracings along food shipping chains. These methods are time-consuming and suffer from biased data collected slowly in patient interviews. Here we apply a recently developed, network-theoretical method to identify the spatial origin of food-borne disease outbreaks. Thereby, the network captures the transportation routes of contaminated foods. The technique only requires spatial information on case reports regularly collected by public health institutions and a model for the underlying food distribution network. The approach is based on the idea of replacing the conventional geographic distance with an effective distance that is derived from the topological structure of the underlying food distribution network. We show that this approach can efficiently identify most probable epicenters of food-borne disease outbreaks. We assess and discuss the method in the context of the 2011 EHEC epidemic. Based on plausible assumptions on the structure of the national food distribution network, the approach can correctly localize the origin of the 2011 German EHEC/HUS outbreak.

12.
Hum Hered ; 76(2): 64-75, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24434848

RESUMO

Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). In this study, the kernel converts the genomic information of 2 individuals into a quantitative value reflecting their genetic similarity. With the selection of the kernel, one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for the topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case-control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms.


Assuntos
Algoritmos , Epistasia Genética/genética , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla/métodos , Redes e Vias Metabólicas/genética , Modelos Genéticos , Transdução de Sinais/genética , Artrite Reumatoide/genética , Simulação por Computador , Humanos , Modelos Lineares , Neoplasias Pulmonares/genética , Fatores de Risco
13.
PLoS Genet ; 8(5): e1002685, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22570636

RESUMO

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.


Assuntos
Drosophila melanogaster/genética , Genoma de Inseto , Genótipo , Locos de Características Quantitativas , Animais , Teorema de Bayes , Mapeamento Cromossômico , Genética Populacional , Desequilíbrio de Ligação , Modelos Genéticos , Modelos Teóricos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética , Análise de Sequência de DNA
14.
Hum Hered ; 74(2): 97-108, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23466369

RESUMO

OBJECTIVES: The logistic kernel machine test (LKMT) is a testing procedure tailored towards high-dimensional genetic data. Its use in pathway analyses of case-control genome-wide association studies results from its computational efficiency and flexibility in incorporating additional information via the kernel. The kernel can be any positive definite function; unfortunately, its form strongly influences the test's power and bias. Most authors have recommended the use of a simple linear kernel. We demonstrate via a simulation that the probability of rejecting the null hypothesis of no association just by chance increases with the number of SNPs or genes in the pathway when applying a simple linear kernel. METHODS: We propose a novel kernel that includes an appropriate standardization in order to protect against any inflation of false positive results. Moreover, our novel kernel contains information on gene membership of SNPs in the pathway. RESULTS: When applying the novel kernel to data from the North American Rheumatoid Arthritis Consortium, we find that even this basic genomic structure can improve the ability of the LKMT to identify meaningful associations. We also demonstrate that the standardization effectively eliminates problems of size bias. CONCLUSION: We recommend the use of our standardized kernel and urge caution when using non-adjusted kernels in the LKMT to conduct pathway analyses.


Assuntos
Artrite Reumatoide/genética , Modelos Genéticos , Artrite Reumatoide/epidemiologia , Viés , Simulação por Computador , Genes , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único , Tamanho da Amostra
15.
Genetics ; 188(3): 695-708, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21515573

RESUMO

Genomic data provide a valuable source of information for modeling covariance structures, allowing a more accurate prediction of total genetic values (GVs). We apply the kriging concept, originally developed in the geostatistical context for predictions in the low-dimensional space, to the high-dimensional space spanned by genomic single nucleotide polymorphism (SNP) vectors and study its properties in different gene-action scenarios. Two different kriging methods ["universal kriging" (UK) and "simple kriging" (SK)] are presented. As a novelty, we suggest use of the family of Matérn covariance functions to model the covariance structure of SNP vectors. A genomic best linear unbiased prediction (GBLUP) is applied as a reference method. The three approaches are compared in a whole-genome simulation study considering additive, additive-dominance, and epistatic gene-action models. Predictive performance is measured in terms of correlation between true and predicted GVs and average true GVs of the individuals ranked best by prediction. We show that UK outperforms GBLUP in the presence of dominance and epistatic effects. In a limiting case, it is shown that the genomic covariance structure proposed by VanRaden (2008) can be considered as a covariance function with corresponding quadratic variogram. We also prove theoretically that if a specific linear relationship exists between covariance matrices for two linear mixed models, the GVs resulting from BLUP are linked by a scaling factor. Finally, the relation of kriging to other models is discussed and further options for modeling the covariance structure, which might be more appropriate in the genomic context, are suggested.


Assuntos
Cruzamento , Genética Populacional/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Análise de Variância , Animais , Teorema de Bayes , Epistasia Genética , Genética Populacional/estatística & dados numéricos , Genoma , Modelos Lineares , Plantas , Valor Preditivo dos Testes , Locos de Características Quantitativas/genética , Software
16.
J Contam Hydrol ; 79(1-2): 25-44, 2005 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16095754

RESUMO

A novel risk index for the vulnerability of groundwater by pollutants is defined as the form parameter of the Pareto distribution and estimated from dye tracer experiments. The Pareto distribution appears as the limit distribution of the extreme value theory, which has been applied to an idealized model of drops that run along a path. The properties of the risk index are investigated by a Monte Carlo study, where the paths are modelled by means of Gaussian random fields. The method is applied to three profiles obtained from Brilliant Blue tracer experiments of the soil physics group at ETH Zurich. It is shown that a single profile can be rather well characterised by the risk index. However, due to the high variability of the dye tracer profiles, an estimated number of at least 15 profile pictures are necessary to characterise a soil.


Assuntos
Poluentes do Solo/análise , Movimentos da Água , Poluentes da Água/análise , Agricultura , Poluentes Atmosféricos , Corantes/análise , Método de Monte Carlo , Medição de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...