Pesquisa | Portal Regional da BVS

1.

Testing for a difference in means of a single feature after clustering.

Chen, Yiqun T; Gao, Lucy L.

ArXiv ; 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-38076519

RESUMO

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test based on the proposed p-value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.

2.

Microbes with higher metabolic independence are enriched in human gut microbiomes under stress.

Veseli, Iva; Chen, Yiqun T; Schechter, Matthew S; Vanni, Chiara; Fogarty, Emily C; Watson, Andrea R; Jabri, Bana; Blekhman, Ran; Willis, Amy D; Yu, Michael K; Fernàndez-Guerra, Antonio; Füssel, Jessika; Eren, A Murat.

bioRxiv ; 2023 May 26.

Artigo em Inglês | MEDLINE | ID: mdl-37293035

RESUMO

A wide variety of human diseases are associated with loss of microbial diversity in the human gut, inspiring a great interest in the diagnostic or therapeutic potential of the microbiota. However, the ecological forces that drive diversity reduction in disease states remain unclear, rendering it difficult to ascertain the role of the microbiota in disease emergence or severity. One hypothesis to explain this phenomenon is that microbial diversity is diminished as disease states select for microbial populations that are more fit to survive environmental stress caused by inflammation or other host factors. Here, we tested this hypothesis on a large scale, by developing a software framework to quantify the enrichment of microbial metabolisms in complex metagenomes as a function of microbial diversity. We applied this framework to over 400 gut metagenomes from individuals who are healthy or diagnosed with inflammatory bowel disease (IBD). We found that high metabolic independence (HMI) is a distinguishing characteristic of microbial communities associated with individuals diagnosed with IBD. A classifier we trained using the normalized copy numbers of 33 HMI-associated metabolic modules not only distinguished states of health versus IBD, but also tracked the recovery of the gut microbiome following antibiotic treatment, suggesting that HMI is a hallmark of microbial communities in stressed gut environments.

3.

Quantifying uncertainty in spikes estimated from calcium imaging data.

Chen, Yiqun T; Jewell, Sean W; Witten, Daniela M.

Biostatistics ; 24(2): 481-501, 2023 04 14.

Artigo em Inglês | MEDLINE | ID: mdl-34654923

RESUMO

In recent years, a number of methods have been proposed to estimate the times at which a neuron spikes on the basis of calcium imaging data. However, quantifying the uncertainty associated with these estimated spikes remains an open problem. We consider a simple and well-studied model for calcium imaging data, which states that calcium decays exponentially in the absence of a spike, and instantaneously increases when a spike occurs. We wish to test the null hypothesis that the neuron did not spike-i.e., that there was no increase in calcium-at a particular timepoint at which a spike was estimated. In this setting, classical hypothesis tests lead to inflated Type I error, because the spike was estimated on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute finite-sample $p$-values that control selective Type I error, and confidence intervals with correct selective coverage, for spikes estimated using a recent proposal from the literature. We apply our proposal in simulation and on calcium imaging data from the $\texttt{spikefinder}$ challenge.

Assuntos

Cálcio , Diagnóstico por Imagem , Humanos , Incerteza , Potenciais de Ação/fisiologia , Simulação por Computador , Algoritmos

4.

Selective inference for k-means clustering.

Chen, Yiqun T; Witten, Daniela M.

J Mach Learn Res ; 242023 May.

Artigo em Inglês | MEDLINE | ID: mdl-38264325

RESUMO

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of k-means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the k-means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

5.

Social Networks and HIV Care Outcomes in Rural Kenya and Uganda.

Chen, Yiqun T; Brown, Lillian; Chamie, Gabriel; Kwarisiima, Dalsone; Ayieko, James; Kabami, Jane; Charlebois, Edwin; Clark, Tamara; Kamya, Moses; Havlir, Diane V; Petersen, Maya L; Balzer, Laura B.

Epidemiology ; 32(4): 551-559, 2021 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-33767114

RESUMO

BACKGROUND: Social isolation among HIV-positive persons might be an important barrier to care. Using data from the SEARCH Study in rural Kenya and Uganda, we constructed 32 community-wide, sociocentric networks and evaluated whether less socially connected HIV-positive persons were less likely to know their status, have initiated treatment, and be virally suppressed. METHODS: Between 2013 and 2014, 168,720 adult residents in the SEARCH Study were census-enumerated, offered HIV testing, and asked to name social contacts. Social networks were constructed by matching named contacts to other residents. We characterized the resulting networks and estimated risk ratios (aRR) associated with poor HIV care outcomes, adjusting for sociodemographic factors and clustering by community with generalized estimating equations. RESULTS: The sociocentric networks contained 170,028 residents (nodes) and 362,965 social connections (edges). Among 11,239 HIV-positive persons who named ≥1 contact, 30.9% were previously undiagnosed, 43.7% had not initiated treatment, and 49.4% had viral nonsuppression. Lower social connectedness, measured by the number of persons naming an HIV-positive individual as a contact (in-degree), was associated with poorer outcomes in Uganda, but not Kenya. Specifically, HIV-positive persons in the lowest connectedness tercile were less likely to be previously diagnosed (Uganda-West aRR: 0.89 [95% confidence interval (CI): 0.83, 0.96]; Uganda-East aRR: 0.85 [95% CI: 0.76, 0.96]); on treatment (Uganda-West aRR: 0.88 [95% CI: 0.80, 0.98]; Uganda-East aRR: 0.81 [0.72, 0.92]), and suppressed (Uganda-West aRR: 0.84 [95% CI: 0.73, 0.96]; Uganda-East aRR: 0.74 [95% CI: 0.58, 0.94]) than those in the highest connectedness tercile. CONCLUSIONS: HIV-positive persons named as a contact by fewer people may be at higher risk for poor HIV care outcomes, suggesting opportunities for targeted interventions.

Assuntos

Infecções por HIV , Adulto , Infecções por HIV/tratamento farmacológico , Infecções por HIV/epidemiologia , Humanos , Quênia/epidemiologia , População Rural , Rede Social , Uganda/epidemiologia

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA