Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Stat Med ; 42(29): 5419-5450, 2023 Dec 20.
Article in English | MEDLINE | ID: mdl-37759370

ABSTRACT

The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety-critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real-world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's results of our method with a number of widely-used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot-strapped simulated experiments as well as real-world clinical observations in the MIMIC-III public dataset.


Subject(s)
Models, Statistical , Palliative Care , Humans , Triazoles
2.
Inf inference ; 11(2): 739-780, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35721800

ABSTRACT

Estimation of density functions supported on general domains arises when the data are naturally restricted to a proper subset of the real space. This problem is complicated by typically intractable normalizing constants. Score matching provides a powerful tool for estimating densities with such intractable normalizing constants but as originally proposed is limited to densities on [Formula: see text] and [Formula: see text]. In this paper, we offer a natural generalization of score matching that accommodates densities supported on a very general class of domains. We apply the framework to truncated graphical and pairwise interaction models and provide theoretical guarantees for the resulting estimators. We also generalize a recently proposed method from bounded to unbounded domains and empirically demonstrate the advantages of our method.

4.
BMC Bioinformatics ; 22(1): 486, 2021 Oct 09.
Article in English | MEDLINE | ID: mdl-34627139

ABSTRACT

BACKGROUND: Differential correlation networks are increasingly used to delineate changes in interactions among biomolecules. They characterize differences between omics networks under two different conditions, and can be used to delineate mechanisms of disease initiation and progression. RESULTS: We present a new R package, CorDiffViz, that facilitates the estimation and visualization of differential correlation networks using multiple correlation measures and inference methods. The software is implemented in R, HTML and Javascript, and is available at https://github.com/sqyu/CorDiffViz . Visualization has been tested for the Chrome and Firefox web browsers. A demo is available at https://diffcornet.github.io/CorDiffViz/demo.html . CONCLUSIONS: Our software offers considerable flexibility by allowing the user to interact with the visualization and choose from different estimation methods and visualizations. It also allows the user to easily toggle between correlation networks for samples under one condition and differential correlations between samples under two conditions. Moreover, the software facilitates integrative analysis of cross-correlation networks between two omics data sets.


Subject(s)
Software , Web Browser
5.
PLoS Genet ; 16(7): e1008835, 2020 07.
Article in English | MEDLINE | ID: mdl-32644988

ABSTRACT

In most organisms, dietary restriction (DR) increases lifespan. However, several studies have found that genotypes within the same species vary widely in how they respond to DR. To explore the mechanisms underlying this variation, we exposed 178 inbred Drosophila melanogaster lines to a DR or ad libitum (AL) diet, and measured a panel of 105 metabolites under both diets. Twenty four out of 105 metabolites were associated with the magnitude of the lifespan response. These included proteinogenic amino acids and metabolites involved in α-ketoglutarate (α-KG)/glutamine metabolism. We confirm the role of α-KG/glutamine synthesis pathways in the DR response through genetic manipulations. We used covariance network analysis to investigate diet-dependent interactions between metabolites, identifying the essential amino acids threonine and arginine as "hub" metabolites in the DR response. Finally, we employ a novel metabolic and genetic bipartite network analysis to reveal multiple genes that influence DR lifespan response, some of which have not previously been implicated in DR regulation. One of these is CCHa2R, a gene that encodes a neuropeptide receptor that influences satiety response and insulin signaling. Across the lines, variation in an intronic single nucleotide variant of CCHa2R correlated with variation in levels of five metabolites, all of which in turn were correlated with DR lifespan response. Inhibition of adult CCHa2R expression extended DR lifespan of flies, confirming the role of CCHa2R in lifespan response. These results provide support for the power of combined genomic and metabolomic analysis to identify key pathways underlying variation in this complex quantitative trait.


Subject(s)
Aging/genetics , Drosophila Proteins/genetics , Longevity/genetics , Metabolome/genetics , Receptors, G-Protein-Coupled/genetics , Aging/metabolism , Aging/pathology , Animals , Caloric Restriction , Diet , Drosophila melanogaster/genetics , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Insulin/genetics , Metabolomics , Mutation/genetics , Signal Transduction/genetics
6.
FODS 20 (2020) ; 2020: 171-181, 2020 Oct.
Article in English | MEDLINE | ID: mdl-35497571

ABSTRACT

This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a 'naive' ridge estimator in extension of work by Bühlmann (2013) to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments, in which we show our method outperforms those that fail to account for correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.

7.
Ann Appl Stat ; 13(2): 848-873, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31388390

ABSTRACT

Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene co-regulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods; or in bulk data sets. An R implementation is available at https://github.com/amcdavid/HurdleNormal.

8.
J Mach Learn Res ; 202019 Apr.
Article in English | MEDLINE | ID: mdl-34290571

ABSTRACT

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyvärinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over R m . Hyvärinen (2007) extended the approach to distributions supported on the non-negative orthant, R + m . In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. (2016) and improve its theoretical guarantees for non-negative Gaussian graphical models.

9.
Ann Appl Stat ; 11(1): 93-113, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28572869

ABSTRACT

Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 µg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP.

10.
Electron J Stat ; 10(1): 806-854, 2016.
Article in English | MEDLINE | ID: mdl-28638498

ABSTRACT

Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The regularized score matching method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under ℓ1 regularization. Under suitable irrepresentability conditions, we show that ℓ1-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models.

11.
Resuscitation ; 91: 26-31, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25805433

ABSTRACT

OBJECTIVE: The accuracy of methods that classify the cardiac rhythm despite CPR artifact could potentially be improved by utilizing continuous ECG data. Our objective is to compare three approaches which use identical ECG features and differ only in their degree of temporal integration: (1) static classification, which analyzes 4-s ECG frames in isolation; (2) "best-of-three averaging," which takes the average of three consecutive static classifications successively; and (3) "adaptive rhythm sequencing," which uses hidden Markov models to model ECG segments as rhythm sequences. METHODS: Defibrillator recordings from 95 out-of-hospital cardiac arrests were divided into training and test sets. Each method classified the rhythm as asystole, organized rhythm or shockable rhythm throughout the recordings. Classifications were compared to the gold standard of physician review. The primary outcome was accuracy during CPR, which was estimated using a generalized linear mixed-effects model. RESULTS: In the training set, accuracies during CPR were 0.89 (95% CI 0.85, 0.92), 0.92 (95% CI 0.89, 0.94) and 0.97 (95% CI 0.95, 0.98) for the static, best-of-three averaging and adaptive rhythm sequencing methods, respectively. The corresponding results in the test set were 0.92 (95% CI 0.86, 0.96), 0.94 (95% CI 0.89, 0.97), and 0.97 (95% CI 0.94, 0.99). Of the dynamic methods, only adaptive rhythm sequencing was significantly more accurate than static classification in the training (p < 0.001) and test (p = 0.03) sets. CONCLUSION: In a continuous monitoring setting, adaptive rhythm sequencing was significantly more accurate than static rhythm classification during CPR.


Subject(s)
Cardiopulmonary Resuscitation/methods , Electrocardiography/methods , Out-of-Hospital Cardiac Arrest/therapy , Aged , Defibrillators , Female , Heart Rate , Humans , Male , Middle Aged
12.
Biostatistics ; 8(1): 53-71, 2007 Jan.
Article in English | MEDLINE | ID: mdl-16569743

ABSTRACT

RNA viruses provide prominent examples of measurably evolving populations. In human immunodeficiency virus (HIV) infection, the development of drug resistance is of particular interest because precise predictions of the outcome of this evolutionary process are a prerequisite for the rational design of antiretroviral treatment protocols. We present a mutagenetic tree hidden Markov model for the analysis of longitudinal clonal sequence data. Using HIV mutation data from clinical trials, we estimate the order and rate of occurrence of seven amino acid changes that are associated with resistance to the reverse transcriptase inhibitor efavirenz.


Subject(s)
HIV Infections/drug therapy , HIV Infections/virology , HIV/genetics , Markov Chains , Models, Genetic , Alkynes , Benzoxazines , Clone Cells , Cyclopropanes , DNA, Viral/chemistry , DNA, Viral/genetics , Drug Resistance, Viral/genetics , Humans , Longitudinal Studies , Oxazines/therapeutic use , Point Mutation , Polymerase Chain Reaction , Reverse Transcriptase Inhibitors/therapeutic use , Sequence Analysis, DNA
13.
J Speech Lang Hear Res ; 47(3): 610-23, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15212572

ABSTRACT

Discussion abounds in the literature as to whether aphasia is a deficit of linguistic competence or linguistic performance and, if it is a performance deficit, what are its precise mechanisms. Considerable evidence suggests that alteration of nonlinguistic factors can affect language performance in aphasia, a finding that raises questions about the modularity of language and the purity of linguistic mechanisms underlying the putative language deficits in persons with aphasia. This study investigated whether temporal stress plus additional cognitive demands placed on non-brain-damaged adults would produce aphasic-like performance on a picture naming task. Two groups of non-brain-damaged participants completed a picture naming task with additional cognitive demands (use of low frequency words and making semantic judgments about the stimuli). A control group performed this task at their own pace, and an experimental group was placed under time constraints. Naming errors were identified and coded by error type. Errors made by individuals with aphasia from a previous study (S. E. Kohn and H. Goodglass, 1985) were recoded with the coding system used in the present study and were then compared with the types of errors produced by the 2 non-brain-damaged groups. Results generally support the hypothesis that the language performance deficits seen in persons with aphasia exist on a continuum with the language performance of non-brain-damaged individuals. Some error type differences between groups warrant further investigation.


Subject(s)
Aphasia/physiopathology , Semantics , Stress, Physiological/complications , Adult , Aphasia/etiology , Cognition/physiology , Dominance, Cerebral , Female , Humans , Male , Photic Stimulation , Regression Analysis , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...