Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Biometrics ; 79(4): 2961-2973, 2023 12.
Article in English | MEDLINE | ID: mdl-36629736

ABSTRACT

We consider the problem of combining data from observational and experimental sources to draw causal conclusions. To derive combined estimators with desirable properties, we extend results from the Stein shrinkage literature. Our contributions are threefold. First, we propose a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate. Second, we develop two new estimators, prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality. Third, we draw connections between our approach and results in sensitivity analysis, including proposing a method for evaluating the feasibility of our estimators.


Subject(s)
Probability , Causality
2.
Stat Med ; 41(1): 65-86, 2022 01 15.
Article in English | MEDLINE | ID: mdl-34671998

ABSTRACT

We consider how to merge a limited amount of data from a randomized controlled trial (RCT) into a much larger set of data from an observational data base (ODB), to estimate an average causal treatment effect. Our methods are based on stratification. The strata are defined in terms of effect moderators as well as propensity scores estimated in the ODB. Data from the RCT are placed into the strata they would have occupied, had they been in the ODB instead. We assume that treatment differences are comparable in the two data sources. Our first "spiked-in" method simply inserts the RCT data into their corresponding ODB strata. We also consider a data-driven convex combination of the ODB and RCT treatment effect estimates within each stratum. Using the delta method and simulations, we identify a bias problem with the spiked-in estimator that is ameliorated by the convex combination estimator. We apply our methods to data from the Women's Health Initiative, a study of thousands of postmenopausal women which has both observational and experimental data on hormone therapy (HT). Using half of the RCT to define a gold standard, we find that a version of the spiked-in estimator yields lower-MSE estimates of the causal impact of HT on coronary heart disease than would be achieved using either a small RCT or the observational component on its own.


Subject(s)
Research Design , Bias , Causality , Databases, Factual , Female , Humans , Propensity Score
3.
Ann Stat ; 45(5): 1863-1894, 2017 Oct.
Article in English | MEDLINE | ID: mdl-31439967

ABSTRACT

We consider large-scale studies in which thousands of significance tests are performed simultaneously. In some of these studies, the multiple testing procedure can be severely biased by latent confounding factors such as batch effects and unmeasured covariates that correlate with both primary variable(s) of interest (e.g., treatment variable, phenotype) and the outcome. Over the past decade, many statistical methods have been proposed to adjust for the confounders in hypothesis testing. We unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties. In particular, we provide theoretical guarantees for RUV-4 [Gagnon-Bartsch, Jacob and Speed (2013)] and LEAPP [Ann. Appl. Stat. 6 (2012) 1664-1688], which correspond to two different identification conditions in the framework: the first requires a set of "negative controls" that are known a priori to follow the null distribution; the second requires the true nonnulls to be sparse. Two different estimators which are based on RUV-4 and LEAPP are then applied to these two scenarios. We show that if the confounding factors are strong, the resulting estimators can be asymptotically as powerful as the oracle estimator which observes the latent confounding factors. For hypothesis testing, we show the asymptotic z-tests based on the estimators can control the type I error. Numerical experiments show that the false discovery rate is also controlled by the Benjamini-Hochberg procedure when the sample size is reasonably large.

4.
PLoS Genet ; 11(12): e1005728, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26677855

ABSTRACT

We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer's disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer's disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.


Subject(s)
Aging/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Longevity/genetics , Aging/pathology , Humans , Polymorphism, Single Nucleotide
5.
BMC Bioinformatics ; 16: 132, 2015 Apr 28.
Article in English | MEDLINE | ID: mdl-25928861

ABSTRACT

BACKGROUND: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests. RESULTS: We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson's Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared. CONCLUSIONS: We have developed a moment based approximation to linear and quadratic gene set test statistics' permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org) .


Subject(s)
Algorithms , Biomarkers/metabolism , Gene Expression Profiling , Genomics/methods , Models, Statistical , Parkinson Disease/genetics , Data Interpretation, Statistical , Humans
6.
Biometrika ; 102(4): 753-766, 2015 Dec.
Article in English | MEDLINE | ID: mdl-27046938

ABSTRACT

We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal [Formula: see text]-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.

7.
Hum Hered ; 73(1): 52-61, 2012.
Article in English | MEDLINE | ID: mdl-22398955

ABSTRACT

BACKGROUND: Linkage and association analysis based on haplotype transmission disequilibrium can be more informative than single marker analysis. Several works have been proposed in recent years to extend the transmission disequilibrium test (TDT) to haplotypes. Among them, a powerful approach called the evolutionary tree TDT (ET-TDT) incorporates information about the evolutionary relationship among haplotypes using the cladogram of the locus. METHODS: In this work we extend this approach by taking into consideration the sparsity of causal mutations in the evolutionary history. We first introduce the notion of a Bradley-Terry (BT) graph representation of a haplotype locus. The most important property of the BT graph is that sparsity of the edge set of the graph corresponds to small number of causal mutations in the evolution of the haplotypes. We then propose a method to test the null hypothesis of no linkage and association against sparse alternatives under which a small number of edges on the BT graph have non-nil effects. RESULTS AND CONCLUSION: We compare the performance of our approach to that of the ET-TDT through a power study, and show that incorporating sparsity of causal mutations can significantly improve the power of a haplotype-based TDT.


Subject(s)
Haplotypes , Linkage Disequilibrium , Models, Genetic , Algorithms , Computer Simulation , Gene Frequency , Genetic Linkage , Humans
8.
PLoS Genet ; 5(12): e1000776, 2009 Dec.
Article in English | MEDLINE | ID: mdl-20019809

ABSTRACT

In this work we present a method for the differential analysis of gene co-expression networks and apply this method to look for large-scale transcriptional changes in aging. We derived synonymous gene co-expression networks from AGEMAP expression data for 16-month-old and 24-month-old mice. We identified a number of functional gene groups that change co-expression with age. Among these changing groups we found a trend towards declining correlation with age. In particular, we identified a modular (as opposed to uniform) decline in general correlation with age. We identified potential transcriptional mechanisms that may aid in modular correlation decline. We found that computationally identified targets of the NF-KappaB transcription factor decrease expression correlation with age. Finally, we found that genes that are prone to declining co-expression tend to be co-located on the chromosome. Our results conclude that there is a modular decline in co-expression with age in mice. They also indicate that factors relating to both chromosome domains and specific transcription factors may contribute to the decline.


Subject(s)
Aging/genetics , Gene Expression , Gene Regulatory Networks , Animals , Chromosomes , Methods , Mice , NF-kappa B , Transcription Factors , Transcription, Genetic
9.
J Comput Biol ; 16(4): 625-38, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19361331

ABSTRACT

This paper takes a close look at balanced permutations, a recently developed sample reuse method with applications in bioinformatics. It turns out that balanced permutation reference distributions do not have the correct null behavior, which can be traced to their lack of a group structure. We find that they can give p-values that are too permissive to varying degrees. In particular the observed test statistic can be larger than that of all B balanced permutations of a data set with a probability much higher than 1/(B + 1), even under the null hypothesis.


Subject(s)
Genomics/methods , Models, Statistical , Computer Simulation , Genes , Numerical Analysis, Computer-Assisted
10.
PLoS Genet ; 3(11): e201, 2007 Nov.
Article in English | MEDLINE | ID: mdl-18081424

ABSTRACT

We present the AGEMAP (Atlas of Gene Expression in Mouse Aging Project) gene expression database, which is a resource that catalogs changes in gene expression as a function of age in mice. The AGEMAP database includes expression changes for 8,932 genes in 16 tissues as a function of age. We found great heterogeneity in the amount of transcriptional changes with age in different tissues. Some tissues displayed large transcriptional differences in old mice, suggesting that these tissues may contribute strongly to organismal decline. Other tissues showed few or no changes in expression with age, indicating strong levels of homeostasis throughout life. Based on the pattern of age-related transcriptional changes, we found that tissues could be classified into one of three aging processes: (1) a pattern common to neural tissues, (2) a pattern for vascular tissues, and (3) a pattern for steroid-responsive tissues. We observed that different tissues age in a coordinated fashion in individual mice, such that certain mice exhibit rapid aging, whereas others exhibit slow aging for multiple tissues. Finally, we compared the transcriptional profiles for aging in mice to those from humans, flies, and worms. We found that genes involved in the electron transport chain show common age regulation in all four species, indicating that these genes may be exceptionally good markers of aging. However, we saw no overall correlation of age regulation between mice and humans, suggesting that aging processes in mice and humans may be fundamentally different.


Subject(s)
Aging/genetics , Databases, Genetic , Gene Expression Regulation , Animals , Diptera/genetics , Gene Expression Profiling , Helminths/genetics , Humans , Mice , Organ Specificity , Species Specificity
11.
12.
PLoS Genet ; 2(7): e115, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16789832

ABSTRACT

We analyzed expression of 81 normal muscle samples from humans of varying ages, and have identified a molecular profile for aging consisting of 250 age-regulated genes. This molecular profile correlates not only with chronological age but also with a measure of physiological age. We compared the transcriptional profile of muscle aging to previous transcriptional profiles of aging in the kidney and the brain, and found a common signature for aging in these diverse human tissues. The common aging signature consists of six genetic pathways; four pathways increase expression with age (genes in the extracellular matrix, genes involved in cell growth, genes encoding factors involved in complement activation, and genes encoding components of the cytosolic ribosome), while two pathways decrease expression with age (genes involved in chloride transport and genes encoding subunits of the mitochondrial electron transport chain). We also compared transcriptional profiles of aging in humans to those of the mouse and fly, and found that the electron transport chain pathway decreases expression with age in all three organisms, suggesting that this may be a public marker for aging across species.


Subject(s)
Aging , Gene Expression Profiling , Muscles/pathology , Transcription, Genetic , Adolescent , Adult , Aged , Aged, 80 and over , Animals , Biomarkers/metabolism , Drosophila , Female , Humans , Male , Mice , Middle Aged
13.
Proc Natl Acad Sci U S A ; 102(25): 8844-9, 2005 Jun 21.
Article in English | MEDLINE | ID: mdl-15956207

ABSTRACT

This work presents a version of the Metropolis-Hastings algorithm using quasi-Monte Carlo inputs. We prove that the method yields consistent estimates in some problems with finite state spaces and completely uniformly distributed inputs. In some numerical examples, the proposed method is much more accurate than ordinary Metropolis-Hastings sampling.


Subject(s)
Algorithms , Monte Carlo Method , Computing Methodologies , Markov Chains , Models, Statistical
14.
PLoS Biol ; 2(12): e427, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15562319

ABSTRACT

In this study, we found 985 genes that change expression in the cortex and the medulla of the kidney with age. Some of the genes whose transcripts increase in abundance with age are known to be specifically expressed in immune cells, suggesting that immune surveillance or inflammation increases with age. The age-regulated genes show a similar aging profile in the cortex and the medulla, suggesting a common underlying mechanism for aging. Expression profiles of these age-regulated genes mark not only age, but also the relative health and physiology of the kidney in older individuals. Finally, the set of aging-regulated kidney genes suggests specific mechanisms and pathways that may play a role in kidney degeneration with age.


Subject(s)
Aging , Gene Expression Regulation , Kidney/metabolism , Kidney/pathology , Transcription, Genetic , Adult , Age Factors , Aged , Aged, 80 and over , Animals , Biopsy , Female , Humans , Immune System/pathology , Inflammation , Kidney Cortex/pathology , Kidney Glomerulus/metabolism , Kidney Medulla/pathology , Male , Middle Aged , Models, Statistical , Muscles/metabolism , Oligonucleotide Array Sequence Analysis , RNA/metabolism , Regression Analysis , Sex Factors , Time Factors
16.
J Am Soc Nephrol ; 14(11): 2967-74, 2003 Nov.
Article in English | MEDLINE | ID: mdl-14569108

ABSTRACT

Delayed graft function (DGF) is the need for dialysis in the first week after transplantation. Studied were risk factors for DGF in adult (age >/=16 yr) cadaveric renal transplant recipients by means of a multivariable modeling procedure. Only donor and recipient factors known before transplantation were chosen so that the probabilities of DGF could be calculated before transplantation and appropriate preventative measures taken. Data on 19,706 recipients of cadaveric allografts were obtained from the United States Renal Data System registry (1995 to 1998). Graft losses within the first 24 h after surgery were excluded from the analysis (n = 89). Patients whose DGF information was missing or unknown (n = 2820) and patients missing one or more candidate predictors (n = 2951) were also excluded. By means of a multivariable logistic regression analysis, factors contributing to DGF in the remaining 13,846 patients were identified. After validating the logistic regression model, a nomogram was developed as a tool for identifying patients at risk for DGF. The incidence of DGF was 23.7%. Sixteen independent donor or recipient risk factors were found to predict DGF. A nomogram quantifying the relative contribution of each risk factor was created. This index can be used to calculate the risk of DGF for an individual by adding the points associated with each risk factor. The nomogram provides a useful tool for developing a pretransplantation index of the likelihood of DGF occurrence. With this index in hand, better informed treatment and allocation decisions can be made.


Subject(s)
Graft Survival , Kidney Failure, Chronic/surgery , Kidney Transplantation/adverse effects , Models, Theoretical , Adolescent , Adult , Cadaver , Female , Humans , Kidney/physiopathology , Kidney Failure, Chronic/physiopathology , Likelihood Functions , Male , Middle Aged , Oliguria/etiology , Predictive Value of Tests , Risk Factors , Time Factors , Treatment Failure , United States
17.
Genome Res ; 13(8): 1828-37, 2003 Aug.
Article in English | MEDLINE | ID: mdl-12902378

ABSTRACT

One of the most important uses of whole-genome expression data is for the discovery of new genes with similar function to a given list of genes (the query) already known to have closely related function. We have developed an algorithm, called the gene recommender, that ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated. We used the gene recommender to find other genes coexpressed with several sets of query genes, including genes known to function in the retinoblastoma complex. Genetic experiments confirmed that one gene (JC8.6) identified by the gene recommender acts with lin-35 Rb to regulate vulval cell fates, and that another gene (wrm-1) acts antagonistically. We find that the gene recommender returns lists of genes with better precision, for fixed levels of recall, than lists generated using the C. elegans expression topomap.


Subject(s)
Algorithms , Caenorhabditis elegans/genetics , Gene Expression Profiling/methods , Gene Expression Regulation/genetics , Genes, Helminth/genetics , Animals , Computational Biology , DNA, Helminth/analysis , Databases, Genetic/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Genes, Retinoblastoma/genetics , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/statistics & numerical data , RNA Interference , Software
18.
Proc Natl Acad Sci U S A ; 100(14): 8348-53, 2003 Jul 08.
Article in English | MEDLINE | ID: mdl-12826619

ABSTRACT

Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccharomyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.


Subject(s)
Algorithms , Bayes Theorem , Genes, Fungal , Saccharomyces cerevisiae Proteins/physiology , Saccharomyces cerevisiae/genetics , Software , Binding Sites , Genetic Techniques , Oligonucleotide Array Sequence Analysis , Protein Interaction Mapping , Saccharomyces cerevisiae Proteins/genetics , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...