Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
J Appl Stat ; 50(9): 1992-2013, 2023.
Article in English | MEDLINE | ID: mdl-37378270

ABSTRACT

Selecting the number of change points in segmented line regression is an important problem in trend analysis, and there have been various approaches proposed in the literature. We first study the empirical properties of several model selection procedures and propose a new method based on two Schwarz type criteria, a classical Bayes Information Criterion (BIC) and the one with a harsher penalty than BIC (BIC3). The proposed rule is designed to use the former when effect sizes are small and the latter when the effect sizes are large and employs the partial R2 to determine the weight between BIC and BIC3. The proposed method is computationally much more efficient than the permutation test procedure that has been the default method of Joinpoint software developed for cancer trend analysis, and its satisfactory performance is observed in our simulation study. Simulations indicate that the proposed method performs well in keeping the probability of correct selection at least as large as that of BIC3, whose performance is comparable to that of the permutation test procedure, and improves BIC3 when it performs worse than BIC. The proposed method is applied to the U.S. prostate cancer incidence and mortality rates.

2.
Stat Med ; 41(16): 3102-3130, 2022 07 20.
Article in English | MEDLINE | ID: mdl-35522060

ABSTRACT

Since its release of Version 1.0 in 1998, Joinpoint software developed for cancer trend analysis by a team at the US National Cancer Institute has received a considerable attention in the trend analysis community and it became one of most widely used software for trend analysis. The paper published in Statistics in Medicine in 2000 (a previous study) describes the permutation test procedure to select the number of joinpoints, and Joinpoint Version 1.0 implemented the permutation procedure as the default model selection method and employed parametric methods for the asymptotic inference of the model parameters. Since then, various updates and extensions have been made in Joinpoint software. In this paper, we review basic features of Joinpoint, summarize important updates of Joinpoint software since its first release in 1998, and provide more information on two major enhancements. More specifically, these enhancements overcome prior limitations in both the accuracy and computational efficiency of previously used methods. The enhancements include: (i) data driven model selection methods which are generally more accurate under a broad range of data settings and more computationally efficient than the permutation test and (ii) the use of the empirical quantile method for construction of confidence intervals for the slope parameters and the location of the joinpoints, which generally provides more accurate coverage than the prior parametric methods used. We show the impact of these changes in cancer trend analysis published by the US National Cancer Institute.


Subject(s)
Neoplasms , Data Collection , Humans , Regression Analysis , Research Design , Software
3.
J Off Stat ; 36(1): 49-62, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32713989

ABSTRACT

Analysis of trends in health data collected over time can be affected by instantaneous changes in coding that cause sudden increases/decreases, or "jumps," in data. Despite these sudden changes, the underlying continuous trends can present valuable information related to the changing risk profile of the population, the introduction of screening, new diagnostic technologies, or other causes. The joinpoint model is a well-established methodology for modeling trends over time using connected linear segments, usually on a logarithmic scale. Joinpoint models that ignore data jumps due to coding changes may produce biased estimates of trends. In this article, we introduce methods to incorporate a sudden discontinuous jump in an otherwise continuous joinpoint model. The size of the jump is either estimated directly (the Joinpoint-Jump model) or estimated using supplementary data (the Joinpoint-Comparability Ratio model). Examples using ICD-9/ICD-10 cause of death coding changes, and coding changes in the staging of cancer illustrate the use of these models.

4.
Stat Med ; 36(19): 3059-3074, 2017 Aug 30.
Article in English | MEDLINE | ID: mdl-28585245

ABSTRACT

This paper considers an improved confidence interval for the average annual percent change in trend analysis, which is based on a weighted average of the regression slopes in the segmented line regression model with unknown change points. The performance of the improved confidence interval proposed by Muggeo is examined for various distribution settings, and two new methods are proposed for further improvement. The first method is practically equivalent to the one proposed by Muggeo, but its construction is simpler, and it is modified to use the t-distribution instead of the standard normal distribution. The second method is based on the empirical distribution of the residuals and the resampling using a uniform random sample, and its satisfactory performance is indicated by a simulation study. Copyright © 2017 John Wiley & Sons, Ltd.


Subject(s)
Epidemiologic Methods , Regression Analysis , Biometry/methods , Computer Simulation , Confidence Intervals , Humans , Mortality/trends , Neoplasms/epidemiology
5.
J Stat Plan Inference ; 170: 106-116, 2016 Mar 01.
Article in English | MEDLINE | ID: mdl-26858507

ABSTRACT

The Schwarz criterion or Bayes Information Criterion (BIC) is often used to select a model dimension, and some variations of the BIC have been proposed in the context of change-point problems. In this paper, we consider a segmented line regression model with an unknown number of change-points and study asymptotic properties of Schwarz type criteria in selecting the number of change-points. Noticing the overestimating tendency of the traditional BIC observed in some empirical studies and being motivated by asymptotic behavior of the modified BIC proposed by Zhang and Siegmund (2007), we consider a variation of the Schwarz type criterion that applies a harsher penalty equivalent to the model with one additional unknown parameter per segment. For the segmented line regression model without the continuity constraint, we prove the consistency of the number of change-points selected by the criterion with such type of a modification and summarize the simulation results that support the consistency. Further simulations are conducted for the model with the continuity constraint, and we empirically observe that the asymptotic behavior of this modified version of BIC is comparable to that of the criterion proposed by Liu, Wu, and Zidek (1997).

6.
Stat Med ; 33(23): 4087-103, 2014 Oct 15.
Article in English | MEDLINE | ID: mdl-24895073

ABSTRACT

In this paper, we propose methods to cluster groups of two-dimensional data whose mean functions are piecewise linear into several clusters with common characteristics such as the same slopes. To fit segmented line regression models with common features for each possible cluster, we use a restricted least squares method. In implementing the restricted least squares method, we estimate the maximum number of segments in each cluster by using both the permutation test method and the Bayes information criterion method and then propose to use the Bayes information criterion to determine the number of clusters. For a more effective implementation of the clustering algorithm, we propose a measure of the minimum distance worth detecting and illustrate its use in two examples. We summarize simulation results to study properties of the proposed methods and also prove the consistency of the cluster grouping estimated with a given number of clusters. The presentation and examples in this paper focus on the segmented line regression model with the ordered values of the independent variable, which has been the model of interest in cancer trend analysis, but the proposed method can be applied to a general model with design points either ordered or unordered.


Subject(s)
Epidemiologic Research Design , Prostatic Neoplasms/mortality , Thyroid Neoplasms/epidemiology , Adolescent , Adult , Age Distribution , Aged , Aged, 80 and over , Bayes Theorem , Child , Child, Preschool , Cluster Analysis , Computer Simulation , Female , Humans , Incidence , Infant , Infant, Newborn , Least-Squares Analysis , Linear Models , Male , Middle Aged , SEER Program/statistics & numerical data , United States/epidemiology , Young Adult
7.
Cancer ; 118(4): 1091-9, 2012 Feb 15.
Article in English | MEDLINE | ID: mdl-22228565

ABSTRACT

BACKGROUND: A study was undertaken to evaluate the temporal projection methods that are applied by the American Cancer Society to predict 4-year-ahead projections. METHODS: Cancer mortality data recorded in each year from 1969 through 2007 for the United States overall and for each state from the National Center for Health Statistics was obtained. Based on the mortality data through 2000, 2001, 2002, and 2003, Projections were made 4 years ahead to estimate the expected number of cancer deaths in 2004, 2005, 2006, 2007, respectively, in the United States and in each state, using 5 projection methods. These predictive estimates were compared to the observed number of deaths that occurred for all cancers combined and 47 cancer sites at the national level, and 21 cancer sites at the state level. RESULTS: Among the models that were compared, the joinpoint regression model with modified Bayesian information criterion selection produced estimates that are closest to the actual number of deaths. Overall, results show the 4-year-ahead projection has larger error than 3-year-ahead projection of death counts when the same method is used. However, 4-year-ahead projection from the new method performed better than the 3-year-ahead projection from the current state-space method. CONCLUSIONS: The Joinpoint method with modified Bayesian information criterion model has the smallest error of all the models considered for 4-year-ahead projection of cancer deaths to the current year for the United States overall and for each state. This method will be used by the American Cancer Society to project the number of cancer deaths starting in 2012.


Subject(s)
Forecasting/methods , Neoplasms/epidemiology , Neoplasms/mortality , American Cancer Society , Bayes Theorem , Humans , Models, Statistical , Retrospective Studies , United States/epidemiology
8.
Cancer ; 118(4): 1100-9, 2012 Feb 15.
Article in English | MEDLINE | ID: mdl-22228583

ABSTRACT

BACKGROUND: The current study was undertaken to evaluate the spatiotemporal projection models applied by the American Cancer Society to predict the number of new cancer cases. METHODS: Adaptations of a model that has been used since 2007 were evaluated. Modeling is conducted in 3 steps. In step I, ecologic predictors of spatiotemporal variation are used to estimate age-specific incidence counts for every county in the country, providing an estimate even in those areas that are missing data for specific years. Step II adjusts the step I estimates for reporting delays. In step III, the delay-adjusted predictions are projected 4 years ahead to the current calendar year. Adaptations of the original model include updating covariates and evaluating alternative projection methods. Residual analysis and evaluation of 5 temporal projection methods were conducted. RESULTS: The differences between the spatiotemporal model-estimated case counts and the observed case counts for 2007 were < 1%. After delays in reporting of cases were considered, the difference was 2.5% for women and 3.3% for men. Residual analysis indicated no significant pattern that suggested the need for additional covariates. The vector autoregressive model was identified as the best temporal projection method. CONCLUSIONS: The current spatiotemporal prediction model is adequate to provide reasonable estimates of case counts. To project the estimated case counts ahead 4 years, the vector autoregressive model is recommended to be the best temporal projection method for producing estimates closest to the observed case counts.


Subject(s)
Forecasting/methods , Neoplasms/epidemiology , American Cancer Society , Female , Humans , Incidence , Male , Models, Statistical , Retrospective Studies , Sex Characteristics , United States/epidemiology
9.
J Stat Plan Inference ; 140(7): 1834-1843, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20514142

ABSTRACT

Sequential designs can be used to save computation time in implementing Monte Carlo hypothesis tests. The motivation is to stop resampling if the early resamples provide enough information on the significance of the p-value of the original Monte Carlo test. In this paper, we consider a sequential design called the B-value design proposed by Lan and Wittes and construct the sequential design bounding the resampling risk, the probability that the accept/reject decision is different from the decision from complete enumeration. For the B-value design whose exact implementation can be done by using the algorithm proposed in Fay, Kim and Hachey, we first compare the expected resample size for different designs with comparable resampling risk. We show that the B-value design has considerable savings in expected resample size compared to a fixed resample or simple curtailed design, and comparable expected resample size to the iterative push out design of Fay and Follmann. The B-value design is more practical than the iterative push out design in that it is tractable even for small values of resampling risk, which was a challenge with the iterative push out design. We also propose an approximate B-value design that can be constructed without using a specially developed software and provides analytic insights on the choice of parameter values in constructing the exact B-value design.

10.
Stat Sin ; 19(2): 597-609, 2009 May 01.
Article in English | MEDLINE | ID: mdl-19738935

ABSTRACT

Segmented line regression has been used in many applications, and the problem of estimating the number of change-points in segmented line regression has been discussed in Kim et al. (2000). This paper studies asymptotic properties of the number of change-points selected by the permutation procedure of Kim et al. (2000). This procedure is based on a sequential application of likelihood ratio type tests, and controls the over-fitting probability by its design. In this paper we show that, under some conditions, the number of change-points selected by the permutation procedure is consistent. Via simulations, the permutation procedure is compared with such information-based criterior as the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and Generalized Cross Validation (GCV).

11.
Biom J ; 50(3): 431-45, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18481362

ABSTRACT

This article proposes a new test to detect interactions in replicated two-way ANOVA models, more powerful than the classical F -test and more general than the test of Terbeck and Davies (1998, Annals of Statistics 26, 1279-1305) developed for the case with unconditionally identifiable interaction pattern. We use the parameterization without the conventional restrictions on the interaction terms and base our test on the maximum of the standardized disturbance estimates. We show that our test is unbiased and consistent, and discuss how to estimate the p -value of the test. In a 3 x 3 case, which is our main focus in this article, the exact p -value can be computed by using four-dimensional integrations. For a general I x J case which requires an (I - 1) x (J - 1) dimensional integration for a numerical evaluation of the exact p -value, we propose to use an improved Bonferroni inequality to estimate an upperbound of the p -value and simulations indicate a reasonable accuracy of the upperbound. Via simulations, we show that our test is more powerful than the classical F -test and also that it can deal with both situations: unconditionally identifiable and non-unconditionally identifiable cases. An application to genetic data is presented in which the new test is significant, while the classical F -test failed to detect interactions.


Subject(s)
Analysis of Variance , Data Interpretation, Statistical , Computer Simulation , Epistasis, Genetic , Humans , Quantitative Trait Loci , Zea mays/genetics
12.
J Comput Graph Stat ; 16(4): 946-967, 2007.
Article in English | MEDLINE | ID: mdl-18633453

ABSTRACT

When designing programs or software for the implementation of Monte Carlo (MC) hypothesis tests, we can save computation time by using sequential stopping boundaries. Such boundaries imply stopping resampling after relatively few replications if the early replications indicate a very large or very small p-value. We study a truncated sequential probability ratio test (SPRT) boundary and provide a tractable algorithm to implement it. We review two properties desired of any MC p-value, the validity of the p-value and a small resampling risk, where resampling risk is the probability that the accept/reject decision will be different than the decision from complete enumeration. We show how the algorithm can be used to calculate a valid p-value and confidence intervals for any truncated SPRT boundary. We show that a class of SPRT boundaries is minimax with respect to resampling risk and recommend a truncated version of boundaries in that class by comparing their resampling risk (RR) to the RR of fixed boundaries with the same maximum resample size. We study the lack of validity of some simple estimators of p-values and offer a new simple valid p-value for the recommended truncated SPRT boundary. We explore the use of these methods in a practical example and provide the MChtest R package to perform the methods.

13.
Biometrics ; 60(4): 1005-14, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15606421

ABSTRACT

Segmented line regression models, which are composed of continuous linear phases, have been applied to describe changes in rate trend patterns. In this article, we propose a procedure to compare two segmented line regression functions, specifically to test (i) whether the two segmented line regression functions are identical or (ii) whether the two mean functions are parallel allowing different intercepts. A general form of the test statistic is described and then the permutation procedure is proposed to estimate the p-value of the test. The permutation test is compared to an approximate F-test in terms of the p-value estimation and the performance of the permutation test is studied via simulations. The tests are applied to compare female lung cancer mortality rates between two registry areas and also to compare female breast cancer mortality rates between two states.


Subject(s)
Regression Analysis , Biometry , Female , Humans , Linear Models , Lung Neoplasms/mortality , Models, Statistical , Registries , United States/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...