Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 45
Filter
1.
J Am Stat Assoc ; 119(546): 1155-1167, 2024.
Article in English | MEDLINE | ID: mdl-39006311

ABSTRACT

Spatial process models are widely used for modeling point-referenced variables arising from diverse scientific domains. Analyzing the resulting random surface provides deeper insights into the nature of latent dependence within the studied response. We develop Bayesian modeling and inference for rapid changes on the response surface to assess directional curvature along a given trajectory. Such trajectories or curves of rapid change, often referred to as wombling boundaries, occur in geographic space in the form of rivers in a flood plain, roads, mountains or plateaus or other topographic features leading to high gradients on the response surface. We demonstrate fully model based Bayesian inference on directional curvature processes to analyze differential behavior in responses along wombling boundaries. We illustrate our methodology with a number of simulated experiments followed by multiple applications featuring the Boston Housing data; Meuse river data; and temperature data from the Northeastern United States.

2.
Biom J ; 65(8): e2100302, 2023 12.
Article in English | MEDLINE | ID: mdl-37853834

ABSTRACT

Human immunodeficiency virus (HIV) dynamics have been the focus of epidemiological and biostatistical research during the past decades to understand the progression of acquired immunodeficiency syndrome (AIDS) in the population. Although there are several approaches for modeling HIV dynamics, one of the most popular is based on Gaussian mixed-effects models because of its simplicity from the implementation and interpretation viewpoints. However, in some situations, Gaussian mixed-effects models cannot (a) capture serial correlation existing in longitudinal data, (b) deal with missing observations properly, and (c) accommodate skewness and heavy tails frequently presented in patients' profiles. For those cases, mixed-effects state-space models (MESSM) become a powerful tool for modeling correlated observations, including HIV dynamics, because of their flexibility in modeling the unobserved states and the observations in a simple way. Consequently, our proposal considers an MESSM where the observations' error distribution is a skew-t. This new approach is more flexible and can accommodate data sets exhibiting skewness and heavy tails. Under the Bayesian paradigm, an efficient Markov chain Monte Carlo algorithm is implemented. To evaluate the properties of the proposed models, we carried out some exciting simulation studies, including missing data in the generated data sets. Finally, we illustrate our approach with an application in the AIDS Clinical Trial Group Study 315 (ACTG-315) clinical trial data set.


Subject(s)
Acquired Immunodeficiency Syndrome , HIV Infections , Humans , Acquired Immunodeficiency Syndrome/epidemiology , HIV Infections/epidemiology , Bayes Theorem , Models, Statistical , Viral Load , HIV , Longitudinal Studies
3.
Spat Stat ; 49: 100542, 2022 Jun.
Article in English | MEDLINE | ID: mdl-34660186

ABSTRACT

Spatio-temporal Poisson models are commonly used for disease mapping. However, after incorporating the spatial and temporal variation, the data do not necessarily have equal mean and variance, suggesting either over- or under-dispersion. In this paper, we propose the Spatio-temporal Conway Maxwell Poisson model. The advantage of Conway Maxwell Poisson distribution is its ability to handle both under- and over-dispersion through controlling one special parameter in the distribution, which makes it more flexible than Poisson distribution. We consider data from the pandemic caused by the SARS-CoV-2 virus in 2019 (COVID-19) that has threatened people all over the world. Understanding the spatio-temporal pattern of the disease is of great importance. We apply a spatio-temporal Conway Maxwell Poisson model to data on the COVID-19 deaths and find that this model achieves better performance than commonly used spatio-temporal Poisson model.

4.
An Acad Bras Cienc ; 93(suppl 3): e20190826, 2021.
Article in English | MEDLINE | ID: mdl-34877968

ABSTRACT

The gamma distribution has been extensively used in many areas of applications. In this paper, considering a Bayesian analysis we provide necessary and sufficient conditions to check whether or not improper priors lead to proper posterior distributions. Further, we also discuss sufficient conditions to verify if the obtained posterior moments are finite. An interesting aspect of our findings are that one can check if the posterior is proper or improper and also if its posterior moments are finite by looking directly in the behavior of the proposed improper prior. To illustrate our proposed methodology these results are applied in different objective priors.


Subject(s)
Bayes Theorem , Gamma Rays
5.
Stat Med ; 40(5): 1073-1100, 2021 02 28.
Article in English | MEDLINE | ID: mdl-33341974

ABSTRACT

The two-part model and the Tweedie model are two essential methods to analyze the positive continuous and zero-augmented responses. Compared with other continuous zero-augmented models, the zero-augmented gamma model (ZAG) demonstrates its performance on the mass zeros data. In this article, we compare the Bayesian model for continuous data of excess zeros by considering the ZAG and Tweedie model. We model the mean of both models in a logarithmic scale and the probability of zero within the zero-augmented model in a logit scale. As previous researchers employed different priors in Bayesian settings for the Tweedie model, by conducting a sensitivity analysis, we select the optimal priors for Tweedie model. Furthermore, we present a simulation study to evaluate the performance of two models in the comparison and apply them to a dataset about the daily fish intake and blood mercury levels from National Health and Nutrition Examination Survey. According to the Watanabe-Akaike information criterion and leave-one-out cross-validation criterion, the Tweedie model provides higher predictive accuracy for the positive continuous and zero-augmented data.


Subject(s)
Models, Statistical , Research Design , Animals , Bayes Theorem , Computer Simulation , Humans , Nutrition Surveys
6.
J Appl Stat ; 48(3): 410-433, 2021.
Article in English | MEDLINE | ID: mdl-35706537

ABSTRACT

Spatial modeling of consumer response data has gained increased interest recently in the marketing literature. In this paper, we extend the (spatial) multi-scale model by incorporating both spatial and temporal dimensions in the dynamic multi-scale spatiotemporal modeling approach. Our empirical application with a US company's catalog purchase data for the period 1997-2001 reveals a nested geographic market structure that spans geopolitical boundaries such as state borders. This structure identifies spatial clusters of consumers who exhibit similar spatiotemporal behavior, thus pointing to the importance of emergent geographic structure, emergent nested structure and dynamic patterns in multi-resolution methods. The multi-scale model also has better performance in estimation and prediction compared with several spatial and spatiotemporal models and uses a scalable and computationally efficient Markov chain Monte Carlo method that makes it suitable for analyzing large spatiotemporal consumer purchase datasets.

7.
Anal Chem ; 93(2): 1059-1067, 2021 01 19.
Article in English | MEDLINE | ID: mdl-33289381

ABSTRACT

The inability to distinguish aggressive from indolent prostate cancer is a longstanding clinical problem. Prostate specific antigen (PSA) tests and digital rectal exams cannot differentiate these forms. Because only ∼10% of diagnosed prostate cancer cases are aggressive, existing practice often results in overtreatment including unnecessary surgeries that degrade patients' quality of life. Here, we describe a fast microfluidic immunoarray optimized to determine 8-proteins simultaneously in 5 µL of blood serum for prostate cancer diagnostics. Using polymeric horseradish peroxidase (poly-HRP, 400 HRPs) labels to provide large signal amplification and limits of detection in the sub-fg mL-1 range, a protocol was devised for the optimization of the fast, accurate assays of 100-fold diluted serum samples. Analysis of 130 prostate cancer patient serum samples revealed that some members of the protein panel can distinguish aggressive from indolent cancers. Logistic regression was used to identify a subset of the panel, combining biomarker proteins ETS-related gene protein (ERG), insulin-like growth factor-1 (IGF-1), pigment epithelial-derived factor (PEDF), and serum monocyte differentiation antigen (CD-14) to predict whether a given patient should be referred for biopsy, which gave a much better predictive accuracy than PSA alone. This represents the first prostate cancer blood test that can predict which patients will have a high biopsy Gleason score, a standard pathology score used to grade tumors.


Subject(s)
Biomarkers, Tumor/blood , Immunoassay , Microfluidic Analytical Techniques , Neoplasm Proteins/blood , Prostatic Neoplasms/diagnosis , Humans , Male , Prostatic Neoplasms/blood
8.
J Appl Stat ; 47(2): 306-322, 2020.
Article in English | MEDLINE | ID: mdl-35706514

ABSTRACT

In this paper, we introduce a new approach to generate flexible parametric families of distributions. These models arise on competitive and complementary risks scenario, in which the lifetime associated with a particular risk is not observable; rather, we observe only the minimum/maximum lifetime value among all risks. The latent variables have a zero-truncated Poisson distribution. For the proposed family of distribution, the extra shape parameter has an important physical interpretation in the competing and complementary risks scenario. The mathematical properties and inferential procedures are discussed. The proposed approach is applied in some existing distributions in which it is fully illustrated by an important data set.

9.
Stat Methods Med Res ; 29(7): 2015-2033, 2020 Jul.
Article in English | MEDLINE | ID: mdl-31625453

ABSTRACT

Response variables in medical sciences are often bounded, e.g. proportions, rates or fractions of incidence of some disease. In this work, we are interested to study if some characteristics of the population, e.g. sex and race which can explain the incidence rate of colorectal cancer cases. To accommodate such responses, we propose a new class of regression models for bounded response by considering a new distribution in the open unit interval which includes a new parameter to make a more flexible distribution. The proposal is to obtain compound power normal distribution as a base distribution with a quantile transformation of another family of distributions with the same support and then is to study some properties of the new family. In addition, the new family is extended to regression models as an alternative to the regression model with a unit interval response. We also present inferential procedures based on the Bayesian methodology, specifically a Metropolis-Hastings algorithm is used to obtain the Bayesian estimates of parameters. An application to real data to illustrate the use of the new family is considered.


Subject(s)
Colorectal Neoplasms , Bayes Theorem , Colorectal Neoplasms/epidemiology , Humans , Incidence , Normal Distribution
10.
Spat Spatiotemporal Epidemiol ; 29: 149-159, 2019 06.
Article in English | MEDLINE | ID: mdl-31128624

ABSTRACT

This paper proposes a Bayesian hierarchical cure rate survival model for spatially clustered time to event data. We consider a mixture cure rate model with covariates and a flexible (semi)parametric baseline survival distribution for uncured individuals. The spatial correlation structure is introduced in the form of frailties which follow a Multivariate Conditionally Autoregressive distribution on a pre-specified map. We obtain the usual posterior estimates, smoothed by regional level maps of spatial frailties and cure rates. A simulation study demonstrates that the parameters of the models with spatially correlated frailties have smaller relative biases and MSE than the ones obtained using simple frailty models. We apply our methodology to Hodgkin lymphoma cancer survival times for patients diagnosed in the state of Connecticut.


Subject(s)
Hodgkin Disease/epidemiology , Bayes Theorem , Connecticut/epidemiology , Disease-Free Survival , Female , Hodgkin Disease/mortality , Humans , Male , Models, Statistical , Spatial Analysis , Survival Analysis
11.
Stat Biosci ; 10(2): 439-459, 2018 Aug.
Article in English | MEDLINE | ID: mdl-30344778

ABSTRACT

We developed a Bayes factor based approach for the design of non-inferiority clinical trials with a focus on controlling type I error and power. Historical data are incorporated in the Bayesian design via the power prior discussed in Ibrahim and Chen (2000). The properties of the proposed method are examined in detail. An efficient simulation-based computational algorithm is developed to calculate the Bayes factor, type I error and power. The proposed methodology is applied to the design of a non-inferiority medical device clinical trial.

12.
Entropy (Basel) ; 20(3)2018 Mar 07.
Article in English | MEDLINE | ID: mdl-33265267

ABSTRACT

In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in detail. The analysis of two datasets to show the efficiency of the proposed model is performed.

13.
J Multivar Anal ; 157: 14-28, 2017 May.
Article in English | MEDLINE | ID: mdl-28989203

ABSTRACT

Many modern statistical problems can be cast in the framework of multivariate regression, where the main task is to make statistical inference for a possibly sparse and low-rank coefficient matrix. The low-rank structure in the coefficient matrix is of intrinsic multivariate nature, which, when combined with sparsity, can further lift dimension reduction, conduct variable selection, and facilitate model interpretation. Using a Bayesian approach, we develop a unified sparse and low-rank multivariate regression method to both estimate the coefficient matrix and obtain its credible region for making inference. The newly developed sparse and low-rank prior for the coefficient matrix enables rank reduction, predictor selection and response selection simultaneously. We utilize the marginal likelihood to determine the regularization hyperparameter, so our method maximizes its posterior probability given the data. For theoretical aspect, the posterior consistency is established to discuss an asymptotic behavior of the proposed method. The efficacy of the proposed approach is demonstrated via simulation studies and a real application on yeast cell cycle data.

14.
J Comput Graph Stat ; 26(4): 814-825, 2017.
Article in English | MEDLINE | ID: mdl-30337797

ABSTRACT

In multivariate regression models, a sparse singular value decomposition of the regression component matrix is appealing for reducing dimensionality and facilitating interpretation. However, the recovery of such a decomposition remains very challenging, largely due to the simultaneous presence of orthogonality constraints and co-sparsity regularization. By delving into the underlying statistical data generation mechanism, we reformulate the problem as a supervised co-sparse factor analysis, and develop an efficient computational procedure, named sequential factor extraction via co-sparse unit-rank estimation (SeCURE), that completely bypasses the orthogonality requirements. At each step, the problem reduces to a sparse multivariate regression with a unit-rank constraint. Nicely, each sequentially extracted sparse and unit-rank coefficient matrix automatically leads to co-sparsity in its pair of singular vectors. Each latent factor is thus a sparse linear combination of the predictors and may influence only a subset of responses. The proposed algorithm is guaranteed to converge, and it ensures efficient computation even with incomplete data and/or when enforcing exact orthogonality is desired. Our estimators enjoy the oracle properties asymptotically; a non-asymptotic error bound further reveals some interesting finite-sample behaviors of the estimators. The efficacy of SeCURE is demonstrated by simulation studies and two applications in genetics.

15.
J Am Stat Assoc ; 112(520): 1733-1743, 2017.
Article in English | MEDLINE | ID: mdl-37013199

ABSTRACT

We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ 2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.

16.
Stat Methodol ; 32: 107-121, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27695391

ABSTRACT

Latent class analysis is used to group categorical data into classes via a probability model. Model selection criteria then judge how well the model fits the data. When addressing incomplete data, the current methodology restricts the imputation to a single, pre-specified number of classes. We seek to develop an entropy-based model selection criterion that does not restrict the imputation to one number of clusters. Simulations show the new criterion performing well against the current standards of AIC and BIC, while a family studies application demonstrates how the criterion provides more detailed and useful results than AIC and BIC.

17.
Biom J ; 58(5): 1178-97, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27225466

ABSTRACT

Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa.


Subject(s)
Data Interpretation, Statistical , Models, Biological , Neoplasms/mortality , Bayes Theorem , Computer Simulation , Humans , Iowa/epidemiology , Neoplasms/epidemiology , Normal Distribution
18.
Biostatistics ; 17(3): 468-83, 2016 07.
Article in English | MEDLINE | ID: mdl-26861909

ABSTRACT

In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study.


Subject(s)
Algorithms , Data Interpretation, Statistical , Models, Statistical , Regression Analysis , Supervised Machine Learning , Alcoholism/genetics , Animals , Body Weight/genetics , Humans , Mice
19.
Biometrics ; 72(3): 707-19, 2016 09.
Article in English | MEDLINE | ID: mdl-26686333

ABSTRACT

In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function.


Subject(s)
Models, Statistical , Regression Analysis , Statistics, Nonparametric , Animals , Anthracosis/diagnosis , Anthracosis/etiology , Coal Mining/statistics & numerical data , Computer Simulation/statistics & numerical data , Deep Brain Stimulation/statistics & numerical data , Fatigue/therapy , Haplorhini , Humans , Normal Distribution , Predictive Value of Tests
20.
Stat Methods Med Res ; 25(1): 167-87, 2016 Feb.
Article in English | MEDLINE | ID: mdl-22514030

ABSTRACT

We propose a hierarchical Bayesian methodology to model spatially or spatio-temporal clustered survival data with possibility of cure. A flexible continuous transformation class of survival curves indexed by a single parameter is used. This transformation model is a larger class of models containing two special cases of the well-known existing models: the proportional hazard and the proportional odds models. The survival curve is modeled as a function of a baseline cumulative distribution function, cure rates, and spatio-temporal frailties. The cure rates are modeled through a covariate link specification and the spatial frailties are specified using a conditionally autoregressive model with time-varying parameters resulting in a spatio-temporal formulation. The likelihood function is formulated assuming that the single parameter controlling the transformation is unknown and full conditional distributions are derived. A model with a non-parametric baseline cumulative distribution function is implemented and a Markov chain Monte Carlo algorithm is specified to obtain the usual posterior estimates, smoothed by regional level maps of spatio-temporal frailties and cure rates. Finally, we apply our methodology to melanoma cancer survival times for patients diagnosed in the state of New Jersey between 2000 and 2007, and with follow-up time until 2007.


Subject(s)
Models, Statistical , Survival Analysis , Algorithms , Bayes Theorem , Biostatistics , Humans , Likelihood Functions , Markov Chains , Melanoma/mortality , Monte Carlo Method , Odds Ratio , Proportional Hazards Models
SELECTION OF CITATIONS
SEARCH DETAIL
...