Search | VHL Regional Portal

Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.

Dekermanjian, Jonathan P; Shaddox, Elin; Nandy, Debmalya; Ghosh, Debashis; Kechris, Katerina.

BMC Bioinformatics ; 23(1): 179, 2022 May 16.

Article in English | MEDLINE | ID: mdl-35578165

ABSTRACT

When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry technologies that often introduce missingness via multiple mechanisms: (1) the metabolite signal may be smaller than the instrument limit of detection; (2) the conditions under which the data are collected and processed may lead to missing values; (3) missing values can be introduced randomly. Missingness resulting from mechanism (1) would be classified as Missing Not At Random (MNAR), that from mechanism (2) would be Missing At Random (MAR), and that from mechanism (3) would be classified as Missing Completely At Random (MCAR). Two common approaches for handling missing data are the following: (1) omit missing data from the analysis; (2) impute the missing values. Both approaches may introduce bias and reduce statistical power in downstream analyses such as testing metabolite associations with clinical variables. Further, standard imputation methods in metabolomics often ignore the mechanisms causing missingness and inaccurately estimate missing values within a data set. We propose a mechanism-aware imputation algorithm that leverages a two-step approach in imputing missing values. First, we use a random forest classifier to classify the missing mechanism for each missing value in the data set. Second, we impute each missing value using imputation algorithms that are specific to the predicted missingness mechanism (i.e., MAR/MCAR or MNAR). Using complete data, we conducted simulations, where we imposed different missingness patterns within the data and tested the performance of combinations of imputation algorithms. Our proposed algorithm provided imputations closer to the original data than those using only one imputation algorithm for all the missing values. Consequently, our two-step approach was able to reduce bias for improved downstream analyses.

Subject(s)

Algorithms , Metabolomics , Bias , Mass Spectrometry/methods , Metabolomics/methods

Bayesian inference of networks across multiple sample groups and data types.

Shaddox, Elin; Peterson, Christine B; Stingo, Francesco C; Hanania, Nicola A; Cruickshank-Quinn, Charmion; Kechris, Katerina; Bowler, Russell; Vannucci, Marina.

Biostatistics ; 21(3): 561-576, 2020 07 01.

Article in English | MEDLINE | ID: mdl-30590505

ABSTRACT

In this article, we develop a graphical modeling framework for the inference of networks across multiple sample groups and data types. In medical studies, this setting arises whenever a set of subjects, which may be heterogeneous due to differing disease stage or subtype, is profiled across multiple platforms, such as metabolomics, proteomics, or transcriptomics data. Our proposed Bayesian hierarchical model first links the network structures within each platform using a Markov random field prior to relate edge selection across sample groups, and then links the network similarity parameters across platforms. This enables joint estimation in a flexible manner, as we make no assumptions on the directionality of influence across the data types or the extent of network similarity across the sample groups and platforms. In addition, our model formulation allows the number of variables and number of subjects to differ across the data types, and only requires that we have data for the same set of groups. We illustrate the proposed approach through both simulation studies and an application to gene expression levels and metabolite abundances on subjects with varying severity levels of chronic obstructive pulmonary disease. Bayesian inference; Chronic obstructive pulmonary disease (COPD); Data integration; Gaussian graphical model; Markov random field prior; Spike and slab prior.

Subject(s)

Biomedical Research/methods , Biostatistics/methods , Data Interpretation, Statistical , Models, Statistical , Bayes Theorem , Computer Simulation , Datasets as Topic , Gene Expression/physiology , Humans , Markov Chains , Metabolome/physiology , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism , Severity of Illness Index

A Bayesian Approach for Learning Gene Networks Underlying Disease Severity in COPD.

Shaddox, Elin; Stingo, Francesco C; Peterson, Christine B; Jacobson, Sean; Cruickshank-Quinn, Charmion; Kechris, Katerina; Bowler, Russell; Vannucci, Marina.

Stat Biosci ; 10(1): 59-85, 2018.

Article in English | MEDLINE | ID: mdl-33912251

ABSTRACT

In this paper, we propose a Bayesian hierarchical approach to infer network structures across multiple sample groups where both shared and differential edges may exist across the groups. In our approach, we link graphs through a Markov random field prior. This prior on network similarity provides a measure of pairwise relatedness that borrows strength only between related groups. We incorporate the computational efficiency of continuous shrinkage priors, improving scalability for network estimation in cases of larger dimensionality. Our model is applied to patient groups with increasing levels of chronic obstructive pulmonary disease severity, with the goal of better understanding the break down of gene pathways as the disease progresses. Our approach is able to identify critical hub genes for four targeted pathways. Furthermore, it identifies gene connections that are disrupted with increased disease severity and that characterize the disease evolution. We also demonstrate the superior performance of our approach with respect to competing methods, using simulated data.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL