Pesquisa | Portal Regional da BVS

1.

Doubly robust proximal synthetic controls.

Qiu, Hongxiang; Shi, Xu; Miao, Wang; Dobriban, Edgar; Tchetgen Tchetgen, Eric.

Biometrics ; 80(2)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38819307

RESUMO

To infer the treatment effect for a single treated unit using panel data, synthetic control (SC) methods construct a linear combination of control units' outcomes that mimics the treated unit's pre-treatment outcome trajectory. This linear combination is subsequently used to impute the counterfactual outcomes of the treated unit had it not been treated in the post-treatment period, and used to estimate the treatment effect. Existing SC methods rely on correctly modeling certain aspects of the counterfactual outcome generating mechanism and may require near-perfect matching of the pre-treatment trajectory. Inspired by proximal causal inference, we obtain two novel nonparametric identifying formulas for the average treatment effect for the treated unit: one is based on weighting, and the other combines models for the counterfactual outcome and the weighting function. We introduce the concept of covariate shift to SCs to obtain these identification results conditional on the treatment assignment. We also develop two treatment effect estimators based on these two formulas and generalized method of moments. One new estimator is doubly robust: it is consistent and asymptotically normal if at least one of the outcome and weighting models is correctly specified. We demonstrate the performance of the methods via simulations and apply them to evaluate the effectiveness of a pneumococcal conjugate vaccine on the risk of all-cause pneumonia in Brazil.

Assuntos

Simulação por Computador , Modelos Estatísticos , Vacinas Pneumocócicas , Humanos , Vacinas Pneumocócicas/uso terapêutico , Vacinas Pneumocócicas/administração & dosagem , Resultado do Tratamento , Biometria/métodos , Interpretação Estatística de Dados

2.

Prediction sets adaptive to unknown covariate shift.

Qiu, Hongxiang; Dobriban, Edgar; Tchetgen Tchetgen, Eric.

J R Stat Soc Series B Stat Methodol ; 85(5): 1680-1705, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38312527

RESUMO

Predicting sets of outcomes-instead of unique outcomes-is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift-a prevalent issue in practice-poses a serious unsolved challenge. In this article, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is asymptotically probably approximately correct, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.

3.

Group testing via hypergraph factorization applied to COVID-19.

Hong, David; Dey, Rounak; Lin, Xihong; Cleary, Brian; Dobriban, Edgar.

Nat Commun ; 13(1): 1837, 2022 04 05.

Artigo em Inglês | MEDLINE | ID: mdl-35383149

RESUMO

Large scale screening is a critical tool in the life sciences, but is often limited by reagents, samples, or cost. An important recent example is the challenge of achieving widespread COVID-19 testing in the face of substantial resource constraints. To tackle this challenge, screening methods must efficiently use testing resources. However, given the global nature of the pandemic, they must also be simple (to aid implementation) and flexible (to be tailored for each setting). Here we propose HYPER, a group testing method based on hypergraph factorization. We provide theoretical characterizations under a general statistical model, and carefully evaluate HYPER with alternatives proposed for COVID-19 under realistic simulations of epidemic spread and viral kinetics. We find that HYPER matches or outperforms the alternatives across a broad range of testing-constrained environments, while also being simpler and more flexible. We provide an online tool to aid lab implementation: http://hyper.covid19-analysis.org .

Assuntos

COVID-19 , Teste para COVID-19 , Humanos , Programas de Rastreamento , Pandemias/prevenção & controle , SARS-CoV-2

4.

How to reduce dimension with PCA and random projections?

Yang, Fan; Liu, Sifan; Dobriban, Edgar; Woodruff, David P.

IEEE Trans Inf Theory ; 67(12): 8154-8189, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-35695837

RESUMO

In our "big data" age, the size and complexity of data is steadily increasing. Methods for dimension reduction are ever more popular and useful. Two distinct types of dimension reduction are "data-oblivious" methods such as random projections and sketching, and "data-aware" methods such as principal component analysis (PCA). Both have their strengths, such as speed for random projections, and data-adaptivity for PCA. In this work, we study how to combine them to get the best of both. We study "sketch and solve" methods that take a random projection (or sketch) first, and compute PCA after. We compute the performance of several popular sketching methods (random iid projections, random sampling, subsampled Hadamard transform, CountSketch, etc) in a general "signal-plus-noise" (or spiked) data model. Compared to well-known works, our results (1) give asymptotically exact results, and (2) apply when the signal components are only slightly above the noise, but the projection dimension is non-negligible. We also study stronger signals allowing more general covariance structures. We find that (a) signal strength decreases under projection in a delicate way depending on the structure of the data and the sketching method, (b) orthogonal projections are slightly more accurate, (c) randomization does not hurt too much, due to concentration of measure, (d) CountSketch can be somewhat improved by a normalization method. Our results have implications for statistical learning and data analysis. We also illustrate that the results are highly accurate in simulations and in analyzing empirical data.

5.

Weighted mining of massive collections of [Formula: see text]-values by convex optimization.

Dobriban, Edgar.

Inf inference ; 7(2): 251-275, 2018 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-29930799

RESUMO

Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp, a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).

6.

Regularity Properties for Sparse Regression.

Dobriban, Edgar; Fan, Jianqing.

Commun Math Stat ; 4(1): 1-19, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27330929

RESUMO

Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and [Formula: see text] sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given data set. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, [Formula: see text] sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.

7.

Genome-Wide Scan Informed by Age-Related Disease Identifies Loci for Exceptional Human Longevity.

Fortney, Kristen; Dobriban, Edgar; Garagnani, Paolo; Pirazzini, Chiara; Monti, Daniela; Mari, Daniela; Atzmon, Gil; Barzilai, Nir; Franceschi, Claudio; Owen, Art B; Kim, Stuart K.

PLoS Genet ; 11(12): e1005728, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26677855

RESUMO

We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer's disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer's disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.

Assuntos

Envelhecimento/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Longevidade/genética , Envelhecimento/patologia , Humanos , Polimorfismo de Nucleotídeo Único

8.

Optimal multiple testing under a Gaussian prior on the effect sizes.

Dobriban, Edgar; Fortney, Kristen; Kim, Stuart K; Owen, Art B.

Biometrika ; 102(4): 753-766, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27046938

RESUMO

We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal [Formula: see text]-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA