Search | VHL Regional Portal

1.

Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects.

Wallin, Jonas; Bogdan, Malgorzata; Szulc, Piotr A; Doerge, R W; Siegmund, David O.

Genetics ; 217(3)2021 03 31.

Article in English | MEDLINE | ID: mdl-33789342

ABSTRACT

Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.

Subject(s)

Crosses, Genetic , Models, Genetic , Multifactorial Inheritance , Quantitative Trait Loci , Animals , Breeding/methods , Genome-Wide Association Study/methods , Genome-Wide Association Study/standards , Plants/genetics

2.

A genome-wide approach for detecting novel insertion-deletion variants of mid-range size.

Xia, Li C; Sakshuwong, Sukolsak; Hopmans, Erik S; Bell, John M; Grimes, Susan M; Siegmund, David O; Ji, Hanlee P; Zhang, Nancy R.

Nucleic Acids Res ; 44(15): e126, 2016 09 06.

Article in English | MEDLINE | ID: mdl-27325742

ABSTRACT

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

Subject(s)

DNA Mutational Analysis/methods , Genome/genetics , Genomics/methods , INDEL Mutation/genetics , Adenoviridae/genetics , Algorithms , Animals , Benchmarking , Computer Simulation , Datasets as Topic , Pan troglodytes/virology , Poisson Distribution , Reproducibility of Results

3.

Joint testing of genotype and ancestry association in admixed families.

Tang, Hua; Siegmund, David O; Johnson, Nicholas A; Romieu, Isabelle; London, Stephanie J.

Genet Epidemiol ; 34(8): 783-91, 2010 Dec.

Article in English | MEDLINE | ID: mdl-21031451

ABSTRACT

Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.

Subject(s)

Alleles , Chromosomes, Human , Genetics, Population/statistics & numerical data , Genome-Wide Association Study/methods , Mexican Americans/genetics , American Indian or Alaska Native/genetics , Asthma , Black People/genetics , Child , Chromosome Mapping/methods , Confidence Intervals , Genome, Human , Genotype , Humans , Parents , White People/genetics

4.

Detecting simultaneous changepoints in multiple sequences.

Zhang, Nancy R; Siegmund, David O; Ji, Hanlee; Li, Jun Z.

Biometrika ; 97(3): 631-645, 2010 Sep.

Article in English | MEDLINE | ID: mdl-22822250

ABSTRACT

We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

5.

A unified framework for linkage and association analysis of quantitative traits.

Dupuis, Josée; Siegmund, David O; Yakir, Benjamin.

Proc Natl Acad Sci U S A ; 104(51): 20210-5, 2007 Dec 18.

Article in English | MEDLINE | ID: mdl-18077372

ABSTRACT

We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.

Subject(s)

Chromosome Mapping , Data Interpretation, Statistical , Models, Genetic , Quantitative Trait Loci , Humans , Pedigree

6.

Approximating the variance of the conditional probability of the state of a hidden Markov model.

Siegmund, David O; Yakir, Benjamin.

Stat Appl Genet Mol Biol ; 6: Article 18, 2007.

Article in English | MEDLINE | ID: mdl-17672820

ABSTRACT

In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe directly the state of the hidden process. In the case where changes of state occur slowly relative to the speed at which information about the underlying state accumulates in the observed data, we compute approximately these covariances in terms of functionals of Brownian motion that arise in change-point analysis. Applications in gene mapping, where these covariances play a role in standardizing the score statistic and in evaluating the loss of noncentrality due to incomplete information, are discussed. Numerical examples illustrate the range of validity and limitations of our results.

Subject(s)

Markov Chains , Models, Genetic , Probability Theory

7.

A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data.

Zhang, Nancy R; Siegmund, David O.

Biometrics ; 63(1): 22-32, 2007 Mar.

Article in English | MEDLINE | ID: mdl-17447926

ABSTRACT

In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.

Subject(s)

Bayes Theorem , Genome , Models, Genetic , Nucleic Acid Hybridization , Biometry , Cell Line , Computer Simulation , Humans , Monte Carlo Method , Oligonucleotide Array Sequence Analysis/methods

8.

Statistical corrections of linkage data suggest predominantly cis regulations of gene expression.

Shi, Jianxin; Siegmund, David O; Levinson, Douglas F.

BMC Proc ; 1 Suppl 1: S145, 2007.

Article in English | MEDLINE | ID: mdl-18466489

ABSTRACT

Morley et al. (Nature 2004, 430:743-747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD asymptotically equal to 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).

9.

Spatial regulation and the rate of signal transduction activation.

Batada, Nizar N; Shepp, Larry A; Siegmund, David O; Levitt, Michael.

PLoS Comput Biol ; 2(5): e44, 2006 May.

Article in English | MEDLINE | ID: mdl-16699596

ABSTRACT

Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.

Subject(s)

Models, Biological , Signal Transduction , Animals , Cytoplasm/metabolism , Dimerization , Proteins/chemistry , Proteins/metabolism , Time Factors

10.

On the power for linkage detection using a test based on scan statistics.

Hernández, Sonia; Siegmund, David O; de Gunst, Mathisca.

Biostatistics ; 6(2): 259-69, 2005 Apr.

Article in English | MEDLINE | ID: mdl-15772104

ABSTRACT

We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.

Subject(s)

Chromosome Mapping/methods , Data Interpretation, Statistical , Genetic Linkage , Models, Genetic , Computer Simulation , Humans

11.

Stochastic model of protein-protein interaction: why signaling proteins need to be colocalized.

Batada, Nizar N; Shepp, Larry A; Siegmund, David O.

Proc Natl Acad Sci U S A ; 101(17): 6445-9, 2004 Apr 27.

Article in English | MEDLINE | ID: mdl-15096590

ABSTRACT

Colocalization of proteins that are part of the same signal transduction pathway via compartmentalization, scaffold, or anchor proteins is an essential aspect of the signal transduction system in eukaryotic cells. If interaction must occur via free diffusion, then the spatial separation between the sources of the two interacting proteins and their degradation rates become primary determinants of the time required for interaction. To understand the role of such colocalization, we create a mathematical model of the diffusion based protein-protein interaction process. We assume that mRNAs, which serve as the sources of these proteins, are located at different positions in the cytoplasm. For large cells such as Drosophila oocytes we show that if the source mRNAs were at random locations in the cell rather than colocalized, the average rate of interactions would be extremely small, which suggests that localization is needed to facilitate protein interactions and not just to prevent cross-talk between different signaling modules.

Subject(s)

Models, Molecular , Proteins/metabolism , Proteins/chemistry

12.

Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition.

Tang, Hua; Siegmund, David O; Shen, Peidong; Oefner, Peter J; Feldman, Marcus W.

Genetics ; 161(1): 447-59, 2002 May.

Article in English | MEDLINE | ID: mdl-12019257

ABSTRACT

This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.

Subject(s)

DNA , Evolution, Molecular , Phylogeny , Algorithms , Base Sequence , Computer Simulation , DNA, Mitochondrial , Models, Genetic , Molecular Sequence Data , Mutation , Selection Bias , Statistics as Topic , Y Chromosome

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL