Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 105
Filter
1.
Stat Med ; 2024 Jun 16.
Article in English | MEDLINE | ID: mdl-38880963

ABSTRACT

In cancer and other medical studies, time-to-event (eg, death) data are common. One major task to analyze time-to-event (or survival) data is usually to compare two medical interventions (eg, a treatment and a control) regarding their effect on patients' hazard to have the event in concern. In such cases, we need to compare two hazard curves of the two related patient groups. In practice, a medical treatment often has a time-lag effect, that is, the treatment effect can only be observed after a time period since the treatment is applied. In such cases, the two hazard curves would be similar in an initial time period, and the traditional testing procedures, such as the log-rank test, would be ineffective in detecting the treatment effect because the similarity between the two hazard curves in the initial time period would attenuate the difference between the two hazard curves that is reflected in the related testing statistics. In this paper, we suggest a new method for comparing two hazard curves when there is a potential treatment time-lag effect based on a weighted log-rank test with a flexible weighting scheme. The new method is shown to be more effective than some representative existing methods in various cases when a treatment time-lag effect is present.

2.
Stat Methods Med Res ; : 9622802241254196, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38767219

ABSTRACT

In many cluster-correlated data analyses, informative cluster size poses a challenge that can potentially introduce bias in statistical analyses. Different methodologies have been introduced in statistical literature to address this bias. In this study, we consider a complex form of informativeness where the number of observations corresponding to latent levels of a unit-level continuous covariate within a cluster is associated with the response variable. This type of informativeness has not been explored in prior research. We present a novel test statistic designed to evaluate the effect of the continuous covariate while accounting for the presence of informativeness. The covariate induces a continuum of latent subgroups within the clusters, and our test statistic is formulated by aggregating values from an established statistic that accounts for informative subgroup sizes when comparing group-specific marginal distributions. Through carefully designed simulations, we compare our test with four traditional methods commonly employed in the analysis of cluster-correlated data. Only our test maintains the size across all data-generating scenarios with informativeness. We illustrate the proposed method to test for marginal associations in periodontal data with this distinctive form of informativeness.

3.
Stat Med ; 43(13): 2527-2546, 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-38618705

ABSTRACT

Urban environments, characterized by bustling mass transit systems and high population density, host a complex web of microorganisms that impact microbial interactions. These urban microbiomes, influenced by diverse demographics and constant human movement, are vital for understanding microbial dynamics. We explore urban metagenomics, utilizing an extensive dataset from the Metagenomics & Metadesign of Subways & Urban Biomes (MetaSUB) consortium, and investigate antimicrobial resistance (AMR) patterns. In this pioneering research, we delve into the role of bacteriophages, or "phages"-viruses that prey on bacteria and can facilitate the exchange of antibiotic resistance genes (ARGs) through mechanisms like horizontal gene transfer (HGT). Despite their potential significance, existing literature lacks a consensus on their significance in ARG dissemination. We argue that they are an important consideration. We uncover that environmental variables, such as those on climate, demographics, and landscape, can obscure phage-resistome relationships. We adjust for these potential confounders and clarify these relationships across specific and overall antibiotic classes with precision, identifying several key phages. Leveraging machine learning tools and validating findings through clinical literature, we uncover novel associations, adding valuable insights to our comprehension of AMR development.


Subject(s)
Bacteriophages , Bacteriophages/genetics , Humans , Least-Squares Analysis , Metagenomics/methods , Drug Resistance, Bacterial/genetics , Gene Transfer, Horizontal , Drug Resistance, Microbial/genetics , Confounding Factors, Epidemiologic , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/therapeutic use , Microbiota/drug effects
4.
BMC Bioinformatics ; 25(1): 117, 2024 Mar 18.
Article in English | MEDLINE | ID: mdl-38500042

ABSTRACT

BACKGROUND: A recent breakthrough in differential network (DN) analysis of microbiome data has been realized with the advent of next-generation sequencing technologies. The DN analysis disentangles the microbial co-abundance among taxa by comparing the network properties between two or more graphs under different biological conditions. However, the existing methods to the DN analysis for microbiome data do not adjust for other clinical differences between subjects. RESULTS: We propose a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that incorporates additional covariates such as continuous age and categorical BMI. SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can be implemented readily for the analysis. We demonstrate through simulations that SOHPIE-DNA consistently reaches higher recall and F1-score, while maintaining similar precision and accuracy to existing methods (NetCoMi and MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut Project and the Diet Exchange Study to showcase the utility. The analysis of the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to incorporate the temporal change of connectivity of taxa with the inclusion of additional covariates. As a result, our method has found taxa that are related to the prevention of intestinal inflammation and severity of fatigue in advanced metastatic cancer patients. CONCLUSION: SOHPIE-DNA is the first attempt of introducing the regression framework for the DN analysis in microbiome data. This enables the prediction of characteristics of a connectivity of a network with the presence of additional covariate information in the regression. The R package with a vignette of our methodology is available through the CRAN repository ( https://CRAN.R-project.org/package=SOHPIE ), named SOHPIE (pronounced as Sofie). The source code and user manual can be found at https://github.com/sjahnn/SOHPIE-DNA .


Subject(s)
Microbiota , Humans , Microbiota/genetics , Software , Regression Analysis , DNA
5.
J Appl Stat ; 51(5): 891-912, 2024.
Article in English | MEDLINE | ID: mdl-38524800

ABSTRACT

We propose a novel personalized concept for the optimal treatment selection for a situation where the response is a multivariate vector that could contain right-censored variables such as survival time. The proposed method can be applied with any number of treatments and outcome variables, under a broad set of models. Following a working semiparametric Single Index Model that relates covariates and responses, we first define a patient-specific composite score, constructed from individual covariates. We then estimate conditional means of each response, given the patient score, correspond to each treatment, using a nonparametric smooth estimator. Next, a rank aggregation technique is applied to estimate an ordering of treatments based on ranked lists of treatment performance measures given by conditional means. We handle the right-censored data by incorporating the inverse probability of censoring weighting to the corresponding estimators. An empirical study illustrates the performance of the proposed method in finite sample problems. To show the applicability of the proposed procedure for real data, we also present a data analysis using HIV clinical trial data, that contained a right-censored survival event as one of the endpoints.

6.
Obes Surg ; 34(1): 1-14, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38040984

ABSTRACT

INTRODUCTION: Obesity affects millions of Americans. The vagal nerves convey the degree of stomach fullness to the brain via afferent visceral fibers. Studies have found that vagal nerve stimulation (VNS) promotes reduced food intake, causes weight loss, and reduces cravings and appetite. METHODS: Here, we evaluate the efficacy of a novel stimulus waveform applied bilaterally to the subdiaphragmatic vagal nerve stimulation (sVNS) for almost 13 weeks. A stimulating cuff electrode was implanted in obesity-prone Sprague Dawley rats maintained on a high-fat diet. Body weight, food consumption, and daily movement were tracked over time and compared against three control groups: sham rats on a high-fat diet that were implanted with non-operational cuffs, rats on a high-fat diet that were not implanted, and rats on a standard diet that were not implanted. RESULTS: Results showed that rats on a high-fat diet that received sVNS attained a similar weight to rats on a standard diet due primarily to a reduction in daily caloric intake. Rats on a high-fat diet that received sVNS had significantly less body fat than other high-fat controls. Rats receiving sVNS also began moving a similar amount to rats on the standard diet. CONCLUSION: Results from this study suggest that bilateral subdiaphragmatic vagal nerve stimulation can alter the rate of growth of rats maintained on a high-fat diet through a reduction in daily caloric intake, returning their body weight to that which is similar to rats on a standard diet over approximately 13 weeks.


Subject(s)
Obesity, Morbid , Vagus Nerve Stimulation , Humans , Rats , Animals , Body Weight/physiology , Adiposity , Vagus Nerve Stimulation/adverse effects , Rats, Sprague-Dawley , Obesity, Morbid/surgery , Obesity/therapy , Obesity/etiology , Diet, High-Fat , Vagus Nerve/physiology
7.
Bioinformatics ; 40(1)2024 01 02.
Article in English | MEDLINE | ID: mdl-38134422

ABSTRACT

SUMMARY: The SOHPIE R package implements a novel functionality for "multivariable" differential co-abundance network (DN, hereafter) analyses of microbiome data. It incorporates a regression approach that adjusts for additional covariates for DN analyses. This distinguishes from previous prominent approaches in DN analyses such as MDiNE and NetCoMi which do not feature a covariate adjustment of finding taxa that are differentially connected (DC, hereafter) between individuals with different clinical and phenotypic characteristics. AVAILABILITY AND IMPLEMENTATION: SOHPIE with a vignette is available on CRAN repository https://CRAN.R-project.org/package=SOHPIE and published under General Public License (GPL) version 3 license.


Subject(s)
Microbiota , Software , Humans
8.
BMC Genomics ; 24(1): 687, 2023 Nov 16.
Article in English | MEDLINE | ID: mdl-37974076

ABSTRACT

BACKGROUND: Advances in sequencing technology and cost reduction have enabled an emergence of various statistical methods used in RNA-sequencing data, including the differential co-expression network analysis (or differential network analysis). A key benefit of this method is that it takes into consideration the interactions between or among genes and do not require an established knowledge in biological pathways. As of now, none of existing softwares can incorporate covariates that should be adjusted if they are confounding factors while performing the differential network analysis. RESULTS: We develop an R package PRANA which a user can easily include multiple covariates. The main R function in this package leverages a novel pseudo-value regression approach for a differential network analysis in RNA-sequencing data. This software is also enclosed with complementary R functions for extracting adjusted p-values and coefficient estimates of all or specific variable for each gene, as well as for identifying the names of genes that are differentially connected (DC, hereafter) between subjects under biologically different conditions from the output. CONCLUSION: Herewith, we demonstrate the application of this package in a real data on chronic obstructive pulmonary disease. PRANA is available through the CRAN repositories under the GPL-3 license: https://cran.r-project.org/web/packages/PRANA/index.html .


Subject(s)
RNA , Software , Humans , Base Sequence , Sequence Analysis, RNA
9.
Stat Methods Med Res ; 32(12): 2285-2298, 2023 12.
Article in English | MEDLINE | ID: mdl-37886856

ABSTRACT

We present a nonparametric method for estimating the conditional future state entry probabilities and distributions of state entry time conditional on a past state visit when data are subject to dependent censorings in a progressive multistate model where Markovianity of the system is not assumed. These estimators are constructed using the competing risk techniques with risk sets consisting of fractional observations and inverse probability of censoring weights. The fractional observations correspond to estimates of the number of persons who ultimately enter a state from which the future state in question can be reached in one step. We then address the corresponding regression problem by combining these marginal estimators with the pseudo-value approach. The performance of our regression scheme is studied using a comprehensive simulation study. An analysis of existing data on graft-versus-host disease for bone marrow transplant individuals is presented using our novel methodology. A second analysis of another well-known data set on burn patients is also included.


Subject(s)
Models, Statistical , Humans , Regression Analysis , Probability , Computer Simulation
10.
Front Genet ; 14: 1235927, 2023.
Article in English | MEDLINE | ID: mdl-37662846

ABSTRACT

The COVID-19 pandemic caused by SARS-CoV-2 has resulted in millions of confirmed cases and deaths worldwide. Understanding the biological mechanisms of SARS-CoV-2 infection is crucial for the development of effective therapies. This study conducts differential expression (DE) analysis, pathway analysis, and differential network (DN) analysis on RNA-seq data of four lung cell lines, NHBE, A549, A549.ACE2, and Calu3, to identify their common and unique biological features in response to SARS-CoV-2 infection. DE analysis shows that cell line A549.ACE2 has the highest number of DE genes, while cell line NHBE has the lowest. Among the DE genes identified for the four cell lines, 12 genes are overlapped, associated with various health conditions. The most significant signaling pathways varied among the four cell lines. Only one pathway, "cytokine-cytokine receptor interaction", is found to be significant among all four cell lines and is related to inflammation and immune response. The DN analysis reveals considerable variation in the differential connectivity of the most significant pathway shared among the four lung cell lines. These findings help to elucidate the mechanisms of SARS-CoV-2 infection and potential therapeutic targets.

11.
Front Rehabil Sci ; 4: 1189292, 2023.
Article in English | MEDLINE | ID: mdl-37484602

ABSTRACT

Objective: We tested Goal Management Training (GMT), which has been recommended as an executive training protocol that may improve the deficits in the complex tasks inherent in life role participation experienced by those with chronic mild traumatic brain injury and post-traumatic stress disease (mTBI/PTSD). We assessed, not only cognitive function, but also life role participation (quality of life). Methods: We enrolled and treated 14 individuals and administered 10 GMT sessions in-person and provided the use of the Veterans Task Manager (VTM), a Smartphone App, which was designed to serve as a "practice-buddy" device to ensure translation of in-person learning to independent home and community practice of complex tasks. Pre-/post-treatment primary measure was the NIH Examiner, Unstructured Task. Secondary measures were as follows: Tower of London time to complete (cTOL), Community Reintegration of Service Members (CRIS) three subdomains [Extent of Participation; Limitations; Satisfaction of Life Role Participation (Satisfaction)]. We analyzed pre-post-treatment, t-test models to explore change, and generated descriptive statistics to inspect given individual patterns of change across measures. Results: There was statistically significant improvement for the NIH EXAMINER Unstructured Task (p < .02; effect size = .67) and cTOL (p < .01; effect size = .52. There was a statistically significant improvement for two CRIS subdomains: Extent of Participation (p < .01; effect size = .75; Limitations (p < .05; effect size = .59). Individuals varied in their treatment response, across measures. Conclusions and Clinical Significance: In Veterans with mTBI/PTSD in response to GMT and the VTM learning support buddy, there was significant improvement in executive cognition processes, sufficiently robust to produce significant improvement in community life role participation. The individual variations support need for precision neurorehabilitation. The positive results occurred in response to treatment advantages afforded by the content of the combined GMT and the employment of the VTM learning support buddy, with advantages including the following: manualized content of the GMT; incremental complex task difficulty; GMT structure and flexibility to incorporate individualized functional goals; and the VTM capability of ensuring translation of in-person instruction to home and community practice, solidifying newly learned executive cognitive processes. Study results support future study, including a potential randomized controlled trial, the manualized GMT and availability of the VTM to ensure future clinical deployment of treatment, as warranted.

12.
Stat Methods Med Res ; 32(8): 1494-1510, 2023 08.
Article in English | MEDLINE | ID: mdl-37323013

ABSTRACT

Multistate current status data presents a more severe form of censoring due to the single observation of study participants transitioning through a sequence of well-defined disease states at random inspection times. Moreover, these data may be clustered within specified groups, and informativeness of the cluster sizes may arise due to the existing latent relationship between the transition outcomes and the cluster sizes. Failure to adjust for this informativeness may lead to a biased inference. Motivated by a clinical study of periodontal disease, we propose an extension of the pseudo-value approach to estimate covariate effects on the state occupation probabilities for these clustered multistate current status data with informative cluster or intra-cluster group sizes. In our approach, the proposed pseudo-value technique initially computes marginal estimators of the state occupation probabilities utilizing nonparametric regression. Next, the estimating equations based on the corresponding pseudo-values are reweighted by functions of the cluster sizes to adjust for informativeness. We perform a variety of simulation studies to study the properties of our pseudo-value regression based on the nonparametric marginal estimators under different scenarios of informativeness. For illustration, the method is applied to the motivating periodontal disease dataset, which encapsulates the complex data-generation mechanism.


Subject(s)
Models, Statistical , Periodontal Diseases , Humans , Cluster Analysis , Computer Simulation , Periodontal Diseases/epidemiology , Sample Size
13.
ArXiv ; 2023 Mar 23.
Article in English | MEDLINE | ID: mdl-36994149

ABSTRACT

A recent breakthrough in differential network (DN) analysis of microbiome data has been realized with the advent of next-generation sequencing technologies. The DN analysis disentangles the microbial co-abundance among taxa by comparing the network properties between two or more graphs under different biological conditions. However, the existing methods to the DN analysis for microbiome data do not adjust for other clinical differences between subjects. We propose a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that incorporates additional covariates such as continuous age and categorical BMI. SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can be implemented readily for the analysis. We demonstrate through simulations that SOHPIE-DNA consistently reaches higher recall and F1-score, while maintaining similar precision and accuracy to existing methods (NetCoMi and MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut Project and the Diet Exchange Study to showcase the utility. The analysis of the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to incorporate the temporal change of connectivity of taxa with the inclusion of additional covariates. As a result, our method has found taxa that are related to the prevention of intestinal inflammation and severity of fatigue in advanced metastatic cancer patients.

14.
Stat Med ; 42(13): 2162-2178, 2023 06 15.
Article in English | MEDLINE | ID: mdl-36973919

ABSTRACT

Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in the statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intracluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.


Subject(s)
Models, Statistical , Tooth , Humans , Cluster Analysis , Computer Simulation , Regression Analysis
15.
BMC Bioinformatics ; 24(1): 8, 2023 Jan 09.
Article in English | MEDLINE | ID: mdl-36624383

ABSTRACT

BACKGROUND: The differential network (DN) analysis identifies changes in measures of association among genes under two or more experimental conditions. In this article, we introduce a pseudo-value regression approach for network analysis (PRANA). This is a novel method of differential network analysis that also adjusts for additional clinical covariates. We start from mutual information criteria, followed by pseudo-value calculations, which are then entered into a robust regression model. RESULTS: This article assesses the model performances of PRANA in a multivariable setting, followed by a comparison to dnapath and DINGO in both univariable and multivariable settings through variety of simulations. Performance in terms of precision, recall, and F1 score of differentially connected (DC) genes is assessed. By and large, PRANA outperformed dnapath and DINGO, neither of which is equipped to adjust for available covariates such as patient-age. Lastly, we employ PRANA in a real data application from the Gene Expression Omnibus database to identify DC genes that are associated with chronic obstructive pulmonary disease to demonstrate its utility. CONCLUSION: To the best of our knowledge, this is the first attempt of utilizing a regression modeling for DN analysis by collective gene expression levels between two or more groups with the inclusion of additional clinical covariates. By and large, adjusting for available covariates improves accuracy of a DN analysis.


Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Humans , Gene Expression Profiling/methods
16.
Stat Med ; 2022 Dec 27.
Article in English | MEDLINE | ID: mdl-36574753

ABSTRACT

We propose a Bayesian hurdle mixed-effects model to analyze longitudinal ordinal data under a complex multilevel structure. This research was motivated by the dataset gathered from the Iowa Fluoride Study (IFS) in order to establish the relationships between fluorosis status and potential risk/protective factors. Dental fluorosis is characterized by spots on tooth enamel and is due to ingestion of excessive fluoride intake during enamel formation. Observations are collected from multiple surface zones on each tooth and on all available teeth of children from the studied cohort, which are longitudinally observed at ages 9, 13, and 17. The data not only exhibit a complex hierarchical structure, but also have a large proportion of zero values that are likely to follow different statistical patterns from non-zero categories. Therefore, we develop a hurdle model to consider the zero category separately, while a proportional odds model is used for the positive categories. The estimated parameters are obtained from a Gibbs sampler implemented by the OpenBUGS software. Our model is compared with two popular methods for ordinal data: the proportional odds model and the partial proportional odds model. We perform a comprehensive analysis of the IFS data and evaluate the accuracy and effectiveness of our methodology through simulation studies. Our discoveries provide novel insights to statisticians and dental practitioners about the associations between patient and clinical characteristics and dental fluorosis.

17.
NeuroRehabilitation ; 49(4): 573-584, 2021.
Article in English | MEDLINE | ID: mdl-34806625

ABSTRACT

BACKGROUND: Gait deficits and functional disability are persistent problems for many stroke survivors, even after standard neurorehabilitation. There is little quantified information regarding the trajectories of response to a long-dose, 12-month intervention. OBJECTIVE: We quantified treatment response to an intensive neurorehabilitation mobility and fitness program. METHODS: The 12-month neurorehabilitation program targeted impairments in balance, limb coordination, gait coordination, and functional mobility, for five chronic stroke survivors. We obtained measures of those variables every two months. RESULTS: We found statistically and clinically significant group improvement in measures of impairment and function. There was high variation across individuals in terms of the timing and the gains exhibited. CONCLUSIONS: Long-duration neurorehabilitation (12 months) for mobility/fitness produced clinically and/or statistically significant gains in impairment and function. There was unique pattern of change for each individual. Gains exhibited late in the treatment support a 12-month intervention. Some measures for some subjects did not reach a plateau at 12 months, justifying further investigation of a longer program (>12 months) of rehabilitation and/or maintenance care for stroke survivors.


Subject(s)
Stroke Rehabilitation , Stroke , Exercise Therapy , Gait , Humans , Quality of Life , Recovery of Function , Stroke/complications , Survivors
18.
PLoS One ; 16(11): e0259193, 2021.
Article in English | MEDLINE | ID: mdl-34767561

ABSTRACT

MOTIVATION: Gene expression data provide an opportunity for reverse-engineering gene-gene associations using network inference methods. However, it is difficult to assess the performance of these methods because the true underlying network is unknown in real data. Current benchmarks address this problem by subsampling a known regulatory network to conduct simulations. But the topology of regulatory networks can vary greatly across organisms or tissues, and reference-based generators-such as GeneNetWeaver-are not designed to capture this heterogeneity. This means, for example, benchmark results from the E. coli regulatory network will not carry over to other organisms or tissues. In contrast, probabilistic generators do not require a reference network, and they have the potential to capture a rich distribution of topologies. This makes probabilistic generators an ideal approach for obtaining a robust benchmarking of network inference methods. RESULTS: We propose a novel probabilistic network generator that (1) provides an alternative to address the inherent limitation of reference-based generators and (2) is able to create realistic gene association networks, and (3) captures the heterogeneity found across gold-standard networks better than existing generators used in practice. Eight organism-specific and 12 human tissue-specific gold-standard association networks are considered. Several measures of global topology are used to determine the similarity of generated networks to the gold-standards. Along with demonstrating the variability of network structure across organisms and tissues, we show that the commonly used "scale-free" model is insufficient for replicating these structures. AVAILABILITY: This generator is implemented in the R package "SeqNet" and is available on CRAN (https://cran.r-project.org/web/packages/SeqNet/index.html).


Subject(s)
Algorithms , Gene Regulatory Networks/genetics , Animals , Gene Expression , Humans , Markov Chains , Software
19.
Front Genet ; 12: 642759, 2021.
Article in English | MEDLINE | ID: mdl-34497631

ABSTRACT

The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)-the proportion of tumor cells in a solid tumor sample-has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies.

20.
Stat Med ; 40(28): 6410-6420, 2021 12 10.
Article in English | MEDLINE | ID: mdl-34496070

ABSTRACT

In studies following selective sampling protocols for secondary outcomes, conventional analyses regarding their appearance could provide misguided information. In the large type 1 diabetes prevention and prediction (DIPP) cohort study monitoring type 1 diabetes-associated autoantibodies, we propose to model their appearance via a multivariate frailty model, which incorporates a correlation component that is important for unbiased estimation of the baseline hazards under the selective sampling mechanism. As further advantages, the frailty model allows for systematic evaluation of the association and the differences in regression parameters among the autoantibodies. We demonstrate the properties of the model by a simulation study and the analysis of the autoantibodies and their association with background factors in the DIPP study, in which we found that high genetic risk is associated with the appearance of all the autoantibodies, whereas the association with sex and urban municipality was evident for IA-2A and IAA autoantibodies.


Subject(s)
Diabetes Mellitus, Type 1 , Frailty , Autoantibodies/analysis , Cohort Studies , Humans , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...