Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
BMC Bioinformatics ; 24(1): 432, 2023 Nov 14.
Article in English | MEDLINE | ID: mdl-37964243

ABSTRACT

BACKGROUND: Deep generative models naturally become nonlinear dimension reduction tools to visualize large-scale datasets such as single-cell RNA sequencing datasets for revealing latent grouping patterns or identifying outliers. The variational autoencoder (VAE) is a popular deep generative method equipped with encoder/decoder structures. The encoder and decoder are useful when a new sample is mapped to the latent space and a data point is generated from a point in a latent space. However, the VAE tends not to show grouping pattern clearly without additional annotation information. On the other hand, similarity-based dimension reduction methods such as t-SNE or UMAP present clear grouping patterns even though these methods do not have encoder/decoder structures. RESULTS: To bridge this gap, we propose a new approach that adopts similarity information in the VAE framework. In addition, for biological applications, we extend our approach to a conditional VAE to account for covariate effects in the dimension reduction step. In the simulation study and real single-cell RNA sequencing data analyses, our method shows great performance compared to existing state-of-the-art methods by producing clear grouping structures using an inferred encoder and decoder. Our method also successfully adjusts for covariate effects, resulting in more useful dimension reduction. CONCLUSIONS: Our method is able to produce clearer grouping patterns than those of other regularized VAE methods by utilizing similarity information encoded in the data via the highly celebrated UMAP loss function.


Subject(s)
Data Analysis , Computer Simulation , Sequence Analysis, RNA
2.
J Pharmacokinet Pharmacodyn ; 48(6): 851-860, 2021 12.
Article in English | MEDLINE | ID: mdl-34347231

ABSTRACT

In pharmacometrics, understanding a covariate effect on an interested outcome is essential for assessing the importance of the covariate. Variance-based global sensitivity analysis (GSA) can simultaneously quantify contribution of each covariate effect to the variability for the interested outcome considering with random effects. The aim of this study was to apply GSA to pharmacometric models to assess covariate effects. Simulations were conducted with pharmacokinetic models to characterize the GSA for assessment of covariate effects and with an example of quantitative systems pharmacology (QSP) models to apply the GSA to a complex model. In the simulations, covariate and random variables were generated to simulate the outcomes using the models. Ratios of variance explained by each factor (each covariate and random effect) over the overall variance of the outcome were used as sensitivity indices. The sensitivity indices were consistent with the effect size of covariate. The sensitivity indices identified the importance of creatinine clearance on the pharmacokinetic exposure for a renally-excreted drug. These sensitivity indices could be applied to plasma concentrations over time (repeated measurable outcomes over time) as interested outcomes. Using the GSA, each contribution of all of the covariate effects could be efficiently identified even in the complex QSP model. Variance-based GSA can provide insight when considering the importance of covariate effects by simultaneously and quantitatively assessing all covariate and random effects on interested outcomes in pharmacometrics.


Subject(s)
Analysis of Variance
3.
Educ Psychol Meas ; 81(1): 61-89, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33456062

ABSTRACT

Factor mixture modeling (FMM) has been increasingly used to investigate unobserved population heterogeneity. This study examined the issue of covariate effects with FMM in the context of measurement invariance testing. Specifically, the impact of excluding and misspecifying covariate effects on measurement invariance testing and class enumeration was investigated via Monte Carlo simulations. Data were generated based on FMM models with (1) a zero covariate effect, (2) a covariate effect on the latent class variable, and (3) covariate effects on both the latent class variable and the factor. For each population model, different analysis models that excluded or misspecified covariate effects were fitted. Results highlighted the importance of including proper covariates in measurement invariance testing and evidenced the utility of a model comparison approach in searching for the correct specification of covariate effects and the level of measurement invariance. This approach was demonstrated using an empirical data set. Implications for methodological and applied research are discussed.

4.
Article in English | MEDLINE | ID: mdl-32863493

ABSTRACT

The finite mixture of regression (FMR) model is a popular tool for accommodating data heterogeneity. In the analysis of FMR models with high-dimensional covariates, it is necessary to conduct regularized estimation and identify important covariates rather than noises. In the literature, there has been a lack of attention paid to the differences among important covariates, which can lead to the underlying structure of covariate effects. Specifically, important covariates can be classified into two types: those that behave the same in different subpopulations and those that behave differently. It is of interest to conduct structured analysis to identify such structures, which will enable researchers to better understand covariates and their associations with outcomes. Specifically, the FMR model with high-dimensional covariates is considered. A structured penalization approach is developed for regularized estimation, selection of important variables, and, equally importantly, identification of the underlying covariate effect structure. The proposed approach can be effectively realized, and its statistical properties are rigorously established. Simulation demonstrates its superiority over alternatives. In the analysis of cancer gene expression data, interesting models/structures missed by the existing analysis are identified.

5.
Front Public Health ; 7: 201, 2019.
Article in English | MEDLINE | ID: mdl-31403039

ABSTRACT

Objectives: Patients' characteristics that could influence graft survival may also exhibit non-constant effects over time; therefore, violating the important assumption of the Cox proportional hazard (PH) model. We describe the effects of covariates on the hazard of graft failure in the presence of long follow-ups. Study Design and Settings: We studied 915 adult patients that received kidney transplant between 1984 and 2000, using Cox PH, a variation of the Aalen additive hazard and Accelerated failure time (AFT) models. Selection of important predictors was based on the purposeful method of variable selection. Results: Out of 915 patients under study, 43% had graft failure by the end of the study. The graft survival rate is 81, 66, and 50% at 1, 5, and 10 years, respectively. Our models indicate that donor type, recipient age, donor-recipient gender match, delayed graft function, diabetes and recipient ethnicity are significant predictors of graft survival. However, only the recipient age and donor-recipient gender match exhibit constant effects in the models. Conclusion: Conclusion made about predictors of graft survival in the Cox PH model without adequate assessment of the model fit could over-estimate significant effects. The additive hazard and AFT models offer more flexibility in understanding covariates with non-constant effects on graft survival. Our results suggest that the period of follow-up in this study is long to support the proportionality assumption. Modeling graft survival at different time points may restrain the possibility of important covariates showing time-variant effects in the Cox PH model.

6.
J Pharm Sci ; 108(1): 692-700, 2019 Jan.
Article in English | MEDLINE | ID: mdl-30423341

ABSTRACT

Time-varying clearance (CL) has been recently recognized in U.S. Food and Drug Administration drug labels for oncology monoclonal antibodies. Pembrolizumab population CL at steady state decreased about 20% from the first dose, and individual CL changes varied from 75% decrease to 25% increase, which were correlating with disease conditions. From mechanism of action perspective, this research explored the longitudinal covariate effect on pembrolizumab CL based on data from a phase II/III clinical trial in patients with non-small cell lung cancer. Time courses of sum of the longest tumor dimensions, lymphocyte count, albumin, and lactate dehydrogenase were first characterized separately, and the post hoc parameters of each individual patient were fixed in the subsequent semimechanistically based modeling analysis. Pembrolizumab time-varying CL was assumed to be associated with the patient's sum of the longest tumor dimensions, lymphocyte count, albumin, and lactate dehydrogenase, and tumor-related pembrolizumab CL was assumed to be a fraction of total pembrolizumab CL in the semimechanistically based modeling.


Subject(s)
Antibodies, Monoclonal, Humanized/pharmacokinetics , Antibodies, Monoclonal/pharmacokinetics , Antineoplastic Agents, Immunological/pharmacokinetics , Carcinoma, Non-Small-Cell Lung/metabolism , Lung Neoplasms/metabolism , Antibodies, Monoclonal/therapeutic use , Antibodies, Monoclonal, Humanized/therapeutic use , Antineoplastic Agents, Immunological/therapeutic use , Carcinoma, Non-Small-Cell Lung/drug therapy , Clinical Trials, Phase II as Topic , Clinical Trials, Phase III as Topic , Female , Humans , Kinetics , Longitudinal Studies , Lung Neoplasms/drug therapy , Male , Middle Aged , Randomized Controlled Trials as Topic , United States , United States Food and Drug Administration
7.
Comput Stat Data Anal ; 113: 125-135, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28966420

ABSTRACT

A model-based clustering method is proposed to address two research aims in Alzheimer's disease (AD): to evaluate the accuracy of imaging biomarkers in AD prognosis, and to integrate biomarker information and standard clinical test results into the diagnoses. One challenge in such biomarker studies is that it is often desired or necessary to conduct the evaluation without relying on clinical diagnoses or some other standard references. This is because (1) biomarkers may provide prognostic information long before any standard reference can be acquired; (2) these references are often based on or provide unfair advantage to standard tests. Therefore, they can mask the prognostic value of a useful biomarker, especially when the biomarker is much more accurate than the standard tests. In addition, the biomarkers and existing tests may be of mixed type and vastly different distributions. A model-based clustering method based on finite mixture modeling framework is introduced. The model allows for the inclusion of mixed typed manifest variables with possible differential covariates to evaluate the prognostic value of biomarkers in addition to standard tests without relying on potentially inaccurate reference diagnoses. Maximum likelihood parameter estimation is carried out via the EM algorithm. Accuracy measures and the ROC curves of the biomarkers are derived subsequently. Finally, the method is illustrated with a real example in AD.

8.
BMC Bioinformatics ; 18(1): 91, 2017 Feb 06.
Article in English | MEDLINE | ID: mdl-28166718

ABSTRACT

BACKGROUND: Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. RESULTS: When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. CONCLUSIONS: We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.


Subject(s)
Computational Biology/methods , Huntington Disease/genetics , Sequence Analysis, RNA/methods , Bayes Theorem , Case-Control Studies , High-Throughput Nucleotide Sequencing , Humans , Huntington Disease/diagnosis , Logistic Models , Models, Theoretical , Reproducibility of Results , Sample Size
9.
Educ Psychol Meas ; 77(5): 766-791, 2017 Oct.
Article in English | MEDLINE | ID: mdl-29795930

ABSTRACT

Researchers continue to be interested in efficient, accurate methods of estimating coefficients of covariates in mixture modeling. Including covariates related to the latent class analysis not only may improve the ability of the mixture model to clearly differentiate between subjects but also makes interpretation of latent group membership more meaningful. Very few studies have been conducted that compare the performance of various approaches to estimating covariate effects in mixture modeling, and fewer yet have considered more complicated models such as growth mixture models where the latent class variable is more difficult to identify. A Monte Carlo simulation was conducted to investigate the performance of four estimation approaches: (1) the conventional three-step approach, (2) the one-step maximum likelihood (ML) approach, (3) the pseudo class (PC) approach, and (4) the three-step ML approach in terms of their ability to recover covariate effects in the logistic regression class membership model within a growth mixture modeling framework. Results showed that when class separation was large, the one-step ML approach and the three-step ML approach displayed much less biased covariate effect estimates than either the conventional three-step approach or the PC approach. When class separation was poor, estimation of the relation between the dichotomous covariate and latent class variable was severely affected when the new three-step ML approach was used.

10.
Pharm Res ; 32(10): 3228-37, 2015 Oct.
Article in English | MEDLINE | ID: mdl-25994981

ABSTRACT

PURPOSE: Clinical Trial Simulations (CTS) are a valuable tool for decision-making during drug development. However, to obtain realistic simulation scenarios, the patients included in the CTS must be representative of the target population. This is particularly important when covariate effects exist that may affect the outcome of a trial. The objective of our investigation was to evaluate and compare CTS results using re-sampling from a population pool and multivariate distributions to simulate patient covariates. METHODS: COPD was selected as paradigm disease for the purposes of our analysis, FEV1 was used as response measure and the effects of a hypothetical intervention were evaluated in different populations in order to assess the predictive performance of the two methods. RESULTS: Our results show that the multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols. CONCLUSION: Both methods, discrete resampling and multivariate distribution generate realistic pools of virtual patients. However the use of a multivariate distribution enable more flexible simulation scenarios since it is not necessarily bound to the existing covariate combinations in the available clinical data sets.


Subject(s)
Computer Simulation , Adult , Aged , Aged, 80 and over , Clinical Trials as Topic , Decision Making , Female , Humans , Male , Middle Aged , Pulmonary Disease, Chronic Obstructive/drug therapy
11.
Article in English | WPRIM (Western Pacific) | ID: wpr-28184

ABSTRACT

One of the important purposes in population pharmacokinetic studies is to investigate the relationships between parameters and covariates to describe parameter variability. The purpose of this study is to evaluate the model's ability to correctly detect the parameter-covariate relationship that can be observed in phase I clinical trials. Data were simulated from a two-compartment model with zero-order absorption and first-order elimination, which was built from valsartan's concentration data collected from a previously conducted study. With creatinine clearance (CLCR) being used as a covariate to be tested, 3 different significance levels of 0.001

Subject(s)
Absorption , Clinical Trials, Phase I as Topic , Creatinine , Dataset , Healthy Volunteers , Hope
12.
Environmetrics ; 25(2): 84-96, 2014 Mar.
Article in English | MEDLINE | ID: mdl-25221430

ABSTRACT

With the growing popularity of spatial mixture models in cluster analysis, model selection criteria have become an established tool in the search for parsimony. However, the label-switching problem is often inherent in Bayesian implementation of mixture models and a variety of relabeling algorithms have been proposed. We use a space-time mixture of Poisson regression models with homogeneous covariate effects to illustrate that the best model selected by using model selection criteria does not always support the model that is chosen by the optimal relabeling algorithm. The results are illustrated for real and simulated datasets. The objective is to make the reader aware that if the purpose of statistical modeling is to identify clusters, applying a relabeling algorithm to the model with the best fit may not generate the optimal relabeling.

13.
Stat Med ; 33(9): 1460-76, 2014 Apr 30.
Article in English | MEDLINE | ID: mdl-24488864

ABSTRACT

The application of model-based meta-analysis in drug development has gained prominence recently, particularly for characterizing dose-response relationships and quantifying treatment effect sizes of competitor drugs. The models are typically nonlinear in nature and involve covariates to explain the heterogeneity in summary-level literature (or aggregate data (AD)). Inferring individual patient-level relationships from these nonlinear meta-analysis models leads to aggregation bias. Individual patient-level data (IPD) are indeed required to characterize patient-level relationships but too often this information is limited. Since combined analyses of AD and IPD allow advantage of the information they share to be taken, the models developed for AD must be derived from IPD models; in the case of linear models, the solution is a closed form, while for nonlinear models, closed form solutions do not exist. Here, we propose a linearization method based on a second order Taylor series approximation for fitting models to AD alone or combined AD and IPD. The application of this method is illustrated by an analysis of a continuous landmark endpoint, i.e., change from baseline in HbA1c at week 12, from 18 clinical trials evaluating the effects of DPP-4 inhibitors on hyperglycemia in diabetic patients. The performance of this method is demonstrated by a simulation study where the effects of varying the degree of nonlinearity and of heterogeneity in covariates (as assessed by the ratio of between-trial to within-trial variability) were studied. A dose-response relationship using an Emax model with linear and nonlinear effects of covariates on the emax parameter was used to simulate data. The simulation results showed that when an IPD model is simply used for modeling AD, the bias in the emax parameter estimate increased noticeably with an increasing degree of nonlinearity in the model, with respect to covariates. When using an appropriately derived AD model, the linearization method adequately corrected for bias. It was also noted that the bias in the model parameter estimates decreased as the ratio of between-trial to within-trial variability in covariate distribution increased. Taken together, the proposed linearization approach allows addressing the issue of aggregation bias in the particular case of nonlinear models of aggregate data.


Subject(s)
Data Interpretation, Statistical , Drug Discovery/statistics & numerical data , Linear Models , Meta-Analysis as Topic , Dose-Response Relationship, Drug , Humans
14.
J Biom Biostat ; 3(2)2012 Mar 23.
Article in English | MEDLINE | ID: mdl-24273689

ABSTRACT

The receiver operating characteristic (ROC) curve has been a popular statistical tool for characterizing the discriminating power of a classifier, such as a biomarker or an imaging modality for disease screening or diagnosis. It has been recognized that the accuracy of a given procedure may depend on some underlying factors, such as subject's demographic characteristics or disease risk factors, among others. Non-parametric- or parametric-based methods tend to be either inefficient or cumbersome when evaluating effect of multiple covariates is the main focus. Here we propose a semi-parametric linear regression framework to model covariate effect. It allows the estimation of sensitivity at given specificity to vary according to the covariates and provides a way to model the area under the ROC curve indirectly. Estimation procedure and asymptotic theory are presented. Extensive simulation studies have been conducted to investigate the validity of the proposed method. We illustrate the new method on a diagnostic test dataset.

SELECTION OF CITATIONS
SEARCH DETAIL
...