Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Test (Madr) ; 33(2): 589-608, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38868722

RESUMO

Generalized linear models (GLMs) are very widely used, but formal goodness-of-fit (GOF) tests for the overall fit of the model seem to be in wide use only for certain classes of GLMs. We develop and apply a new goodness-of-fit test, similar to the well-known and commonly used Hosmer-Lemeshow (HL) test, that can be used with a wide variety of GLMs. The test statistic is a variant of the HL statistic, but we rigorously derive an asymptotically correct sampling distribution using methods of Stute and Zhu (Scand J Stat 29(3):535-545, 2002) and demonstrate its consistency. We compare the performance of our new test with other GOF tests for GLMs, including a naive direct application of the HL test to the Poisson problem. Our test provides competitive or comparable power in various simulation settings and we identify a situation where a naive version of the test fails to hold its size. Our generalized HL test is straightforward to implement and interpret and an R package is publicly available. Supplementary Information: The online version contains supplementary material available at 10.1007/s11749-023-00912-8.

2.
J Appl Stat ; 51(7): 1399-1411, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38835824

RESUMO

The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type I error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates (multiple Bernoulli trials) are present in the data. We demonstrate that a generalized version of the HL test (GHL) presented in previous work can offer some protection against this power loss. These results are also supported by application of both the HL and GHL test to a real-life data set. We conclude with a brief discussion explaining the behavior of the HL test, along with some guidance on how to choose between the two tests. In particular, we suggest the GHL test to be used when there are binary replicates or clusters in the covariate space, provided that the sample size is sufficiently large.

3.
Stat Methods Med Res ; : 9622802241254220, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38780488

RESUMO

Modified Poisson regression, which estimates the regression parameters in the log-binomial regression model using the Poisson quasi-likelihood estimating equation and robust variance, is a useful tool for estimating the adjusted risk and prevalence ratio in binary outcome analysis. Although several goodness-of-fit tests have been developed for other binary regressions, few goodness-of-fit tests are available for modified Poisson regression. In this study, we proposed several goodness-of-fit tests for modified Poisson regression, including the modified Hosmer-Lemeshow test with empirical variance, Tsiatis test, normalized Pearson chi-square tests with binomial variance and Poisson variance, and normalized residual sum of squares test. The original Hosmer-Lemeshow test and normalized Pearson chi-square test with binomial variance are inappropriate for the modified Poisson regression, which can produce a fitted value exceeding 1 owing to the unconstrained parameter space. A simulation study revealed that the normalized residual sum of squares test performed well regarding the type I error probability and the power for a wrong link function. We applied the proposed goodness-of-fit tests to the analysis of cross-sectional data of patients with cancer. We recommend the normalized residual sum of squares test as a goodness-of-fit test in the modified Poisson regression.

4.
MethodsX ; 12: 102536, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38274699

RESUMO

One of the approach of Geographically Weighted Regression (GWR) models is the Geographically Weighted Nonparametric Regression (GWNR) has more parameters than the GWR model. Models with more parameters usually have better match values, which is an advantage, while models with fewer parameters have the advantage of being easier to use and interpret. However, a model with more parameters should be used if it is proven to be significantly superior. Therefore, the purpose of this study was to develop a hypothesis test of goodness of fit test for GWNR model. The goodness of fit test was performed for the real data. We found that the GWNR model was more suitable than the mixed nonparametric regression model. Some highlights of the proposed method are:•A new model for GWR to overcome the unknown regression function by using mixed estimator spline truncated and fourier series at nonparametric regression•Goodness of fit for GWNR to testing the model fit between the mixed nonparametric regression model and GWNR•Applied goodness of fit test to poverty data in Sulawesi Island and infant mortality in East Java.

5.
J Korean Stat Soc ; 52(2): 382-394, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36713637

RESUMO

We develop new goodness of fit test for uniform distribution based on a conditional moment characterization. We study the asymptotic properties of the proposed test statistic. We also present a goodness of fit test for uniform distribution to incorporate the right censored observations and studied its properties. A Monte Carlo simulation study is carried out to evaluate the finite sample performance of the proposed tests. We illustrate the test procedures using real data sets.

6.
Lifetime Data Anal ; 29(4): 854-887, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-36670299

RESUMO

The Kaplan-Meier estimator is ubiquitously used to estimate survival probabilities for time-to-event data. It is nonparametric, and thus does not require specification of a survival distribution, but it does assume that the risk set at any time t consists of independent observations. This assumption does not hold for data from paired organ systems such as occur in ophthalmology (eyes) or otolaryngology (ears), or for other types of clustered data. In this article, we estimate marginal survival probabilities in the setting of clustered data, and provide confidence limits for these estimates with intra-cluster correlation accounted for by an interval-censored version of the Clayton-Oakes model. We develop a goodness-of-fit test for general bivariate interval-censored data and apply it to the proposed interval-censored version of the Clayton-Oakes model. We also propose a likelihood ratio test for the comparison of survival distributions between two groups in the setting of clustered data under the assumption of a constant between-group hazard ratio. This methodology can be used both for balanced and unbalanced cluster sizes, and also when the cluster size is informative. We compare our test to the ordinary log rank test and the Lin-Wei (LW) test based on the marginal Cox proportional Hazards model with robust standard errors obtained from the sandwich estimator. Simulation results indicate that the ordinary log rank test over-inflates type I error, while the proposed unconditional likelihood ratio test has appropriate type I error and higher power than the LW test. The method is demonstrated in real examples from the Sorbinil Retinopathy Trial, and the Age-Related Macular Degeneration Study. Raw data from these two trials are provided.


Assuntos
Retinopatia Diabética , Humanos , Modelos de Riscos Proporcionais , Análise de Sobrevida , Simulação por Computador , Funções Verossimilhança
7.
Biom J ; 65(2): e2200073, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36166681

RESUMO

Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein identities. These identities, in turn, might be utilized to define a corresponding goodness-of-fit (GoF) test, the test statistic of which involves the computation of weighted means for a user-selected weight function f. Here, the choice of f should be done with respect to the relevant alternative scenario, as it will have great impact on the GoF-test's performance. We derive the asymptotics of both the Poisson and binomial Stein-type GoF-statistic for general count distributions (we also briefly consider the negative-binomial case), such that the asymptotic power is easily computed for arbitrary alternatives. This allows for an efficient implementation of optimal Stein tests, that is, which are most powerful within a given class  F $\mathcal {F}$ of weight functions. The performance and application of the optimal Stein-type GoF-tests is investigated by simulations and several medical data examples.


Assuntos
Modelos Estatísticos , Distribuição Binomial
8.
J Appl Stat ; 49(5): 1277-1304, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35707514

RESUMO

In this paper, 91 different tests for exponentiality are reviewed. Some of the tests are universally consistent while others are against some special classes of life distributions. Power performances of 40 of these different tests for exponentiality of datasets are compared through extensive Monte Carlo simulations. The comparisons are conducted for different sample sizes of 10, 25, 50 and 100 for different groups of distributions according to the shape of their hazard functions at 5 percent level of significance. Also, the techniques are applied to two real-world datasets and a measure of power is employed for the comparison of the tests. The results show that some tests which are very good under one group of alternative distributions are not so under another group. Also, some tests maintained relatively high power over all the groups of alternative distributions studied while some others maintained poor power performances over all the groups of alternative distributions. Again, the result obtained from real-world datasets agree completely with those of the simulation studies.

9.
J Public Health (Oxf) ; 44(2): e221-e226, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35325235

RESUMO

BACKGROUND: Previous studies have used Benford's distribution to assess the accuracy of COVID-19 data. Data inaccuracies provide false information to the media, undermine global response and hinder the preventive measures taken by authorities. METHODS: Daily new cases and deaths from all the countries of the European Union were analyzed and the conformance to Benford's distribution was estimated. Two statistical tests and two measures of deviation were calculated to determine whether the reported statistics comply with the expected distribution. Four country-level developmental indexes were included, the GDP per capita, health expenditures, the Universal Health Coverage (UHC) Index and the full vaccination rate. Regression analysis was implemented to examine whether the deviation from Benford's distribution is affected by the aforementioned indexes. RESULTS: The findings indicate that Bulgaria, Croatia, Lithuania and Romania were in line with Benford's distribution. Regarding daily cases, Denmark, Ireland and Greece, showed the greatest deviation from Benford's distribution. Furthermore, it was found that the vaccination rate is positively associated with deviation from Benford's distribution. CONCLUSIONS: The findings suggest that overall, official data provided by authorities are not confirming Benford's law, yet this approach acts as a preliminary tool for data verification. More extensive studies should be made with a more thorough investigation of countries that showed the greatest deviation.


Assuntos
COVID-19 , COVID-19/epidemiologia , COVID-19/prevenção & controle , União Europeia , Grécia , Gastos em Saúde , Humanos , Irlanda
10.
Psychometrika ; 87(3): 1130-1145, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35092575

RESUMO

In practice, it is common that a best fitting structural equation model (SEM) is selected from a set of candidate SEMs and inference is conducted conditional on the selected model. Such post-selection inference ignores the model selection uncertainty and yields too optimistic inference. Using the largest candidate model avoids model selection uncertainty but introduces a large variation. Jin and Ankargren (Psychometrika 84:84-104, 2019) proposed to use frequentist model averaging in SEM with continuous data as a compromise between model selection and the full model. They assumed that the true values of the parameters depend on [Formula: see text] with n being the sample size, which is known as a local asymptotic framework. This paper shows that their results are not directly applicable to SEM with ordinal data. To address this issue, we prove consistency and asymptotic normality of the polychoric correlation estimators under the local asymptotic framework. Then, we propose a new frequentist model averaging estimator and a valid confidence interval that are suitable for ordinal data. Goodness-of-fit test statistics for the model averaging estimator are also derived.


Assuntos
Modelos Teóricos , Psicometria/métodos , Tamanho da Amostra
11.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34849577

RESUMO

Gene set-based signal detection analyses are used to detect an association between a trait and a set of genes by accumulating signals across the genes in the gene set. Since signal detection is concerned with identifying whether any of the genes in the gene set are non-null, a goodness-of-fit (GOF) test can be used to compare whether the observed distribution of gene-level tests within the gene set agrees with the theoretical null distribution. Here, we present a flexible gene set-based signal detection framework based on tail-focused GOF statistics. We show that the power of the various statistics in this framework depends critically on two parameters: the proportion of genes within the gene set that are non-null and the degree of separation between the null and alternative distributions of the gene-level tests. We give guidance on which statistic to choose for a given situation and implement the methods in a fast and user-friendly R package, wHC (https://github.com/mqzhanglab/wHC). Finally, we apply these methods to a whole exome sequencing study of amyotrophic lateral sclerosis.


Assuntos
Esclerose Lateral Amiotrófica , Esclerose Lateral Amiotrófica/genética , Testes Genéticos , Humanos , Fenótipo , Sequenciamento do Exoma
12.
Entropy (Basel) ; 24(10)2022 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37420464

RESUMO

This paper introduces and studies a new generalization of cumulative past extropy called weighted cumulative past extropy (WCPJ) for continuous random variables. We explore the following: if the WCPJs of the last order statistic are equal for two distributions, then these two distributions will be equal. We examine some properties of the WCPJ, and a number of inequalities involving bounds for WCPJ are obtained. Studies related to reliability theory are discussed. Finally, the empirical version of the WCPJ is considered, and a test statistic is proposed. The critical cutoff points of the test statistic are computed numerically. Then, the power of this test is compared to a number of alternative approaches. In some situations, its power is superior to the rest, and in some other settings, it is somewhat weaker than the others. The simulation study shows that the use of this test statistic can be satisfactory with due attention to its simple form and the rich information content behind it.

13.
Am J Hum Genet ; 108(7): 1251-1269, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34214446

RESUMO

With the increasing availability of large-scale GWAS summary data on various complex traits and diseases, there have been tremendous interests in applications of Mendelian randomization (MR) to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on observational data. In spite of the potential significance of such applications, the validity of their causal conclusions critically depends on some strong modeling assumptions required by MR, which may be violated due to the widespread (horizontal) pleiotropy. Although many MR methods have been proposed recently to relax the assumptions by mainly dealing with uncorrelated pleiotropy, only a few can handle correlated pleiotropy, in which some SNPs/IVs may be associated with hidden confounders, such as some heritable factors shared by both traits. Here we propose a simple and effective approach based on constrained maximum likelihood and model averaging, called cML-MA, applicable to GWAS summary data. To deal with more challenging situations with many invalid IVs with only weak pleiotropic effects, we modify and improve it with data perturbation. Extensive simulations demonstrated that the proposed methods could control the type I error rate better while achieving higher power than other competitors. Applications to 48 risk factor-disease pairs based on large-scale GWAS summary data of 3 cardio-metabolic diseases (coronary artery disease, stroke, and type 2 diabetes), asthma, and 12 risk factors confirmed its superior performance.


Assuntos
Algoritmos , Pleiotropia Genética , Funções Verossimilhança , Análise da Randomização Mendeliana/métodos , Asma/etiologia , Doenças Cardiovasculares/etiologia , Causalidade , Simulação por Computador , Diabetes Mellitus Tipo 2/etiologia , Humanos , Modelos Estatísticos , Fatores de Risco
14.
Psychometrika ; 86(2): 564-594, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34097200

RESUMO

The model-implied instrumental variable (MIIV) estimator is an equation-by-equation estimator of structural equation models that is more robust to structural misspecifications than full information estimators. Previous studies have concentrated on endogenous variables that are all continuous (MIIV-2SLS) or all ordinal . We develop a unified MIIV approach that applies to a mixture of binary, ordinal, censored, or continuous endogenous observed variables. We include estimates of factor loadings, regression coefficients, variances, and covariances along with their asymptotic standard errors. In addition, we create new goodness of fit tests of the model and overidentification tests of single equations. Our simulation study shows that the proposed MIIV approach is more robust to structural misspecifications than diagonally weighted least squares (DWLS) and that both the goodness of fit model tests and the overidentification equations tests can detect structural misspecifications. We also find that the bias in asymptotic standard errors for the MIIV estimators of factor loadings and regression coefficients are often lower than the DWLS ones, though the differences are small in large samples. Our analysis shows that scaling indicators with low reliability can adversely affect the MIIV estimators. Also, using a small subset of MIIVs reduces small sample bias of coefficient estimates, but can lower the power of overidentification tests of equations.


Assuntos
Modelos Estatísticos , Análise de Classes Latentes , Análise dos Mínimos Quadrados , Psicometria , Reprodutibilidade dos Testes
15.
Sensors (Basel) ; 22(1)2021 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-35009626

RESUMO

Accurate regional classification of highways is a critical prerequisite to implement a tailored safety assessment. However, there has been inadequate research on objective classification considering traffic flow characteristics for highway safety assessment purposes. We propose an objective and easily applicable classification method that considers the administrative divisions of South Korea. We evaluated the feasibility of this method through various theoretical analysis techniques using the data collected from 536 permanent traffic volume counting stations for the national highways in South Korea in 2019. The ratio of the annual average hourly traffic volume to the annual average daily traffic was used as the explanatory variable. The corresponding results of factor and cluster analyses with this ratio showed a 61% concordance with the urban, suburban, and rural areas classified by the administrative divisions. The results of two-sample goodness-of-fit tests also confirmed that the difference in the three distributions of hourly volume ratios was statistically significant. The results of this study can help enhance highway safety and facilitate the development and application of more appropriate highway safety assessment tools, such as Road Assessment Programs or crash prediction models, for specific regions using the proposed method.


Assuntos
Acidentes de Trânsito , Condução de Veículo , Análise por Conglomerados , República da Coreia , Segurança
16.
Sichuan Mental Health ; (6): 39-43, 2021.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-987565

RESUMO

The purpose of this article was to introduce the χ2 distribution and related contents, including χ2 distribution and non-central χ2 distribution. It focused on showing the definition of two χ2 distributions, the graph and the main properties of the probability density function. Among them, the two most important properties were: first, the limiting distribution of the χ2 distribution was the normal distribution; second, n-1s2σ2followed the χ2 distribution with n-1 degrees of freedom.In addition, it also explained the relationship between the χ2 distribution and the normal distribution, the relationship between χ2 test statistic and Z test statistic. Finally, it illustrated the computational approaches of the χ2 distribution based on the two SAS functions in SAS software.

17.
Sichuan Mental Health ; (6): 417-423, 2021.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-987481

RESUMO

The purpose of this article was to introduce the goodness of fit test and its SAS implementation. The main contents included the following four aspects: ① Pearson΄s goodness of fit test; ② deviance or likelihood ratio goodness of fit test; ③ Hosmer-Lemeshow goodness of fit test; ④ goodness of fit test for the sparse data. In the aforementioned “fourth aspect”, there were six specific test approaches, namely “information matrix test” “information matrix diagonal test” “Osius-Rojek test” “unweighted residual sum of squares test” “Spiegelhalter test” and “Stukel test”. The paper implemented the four types of the goodness of fit tests mentioned above with the help of the SAS software through an example, explained the output results, and made statistical and professional conclusions.

18.
Entropy (Basel) ; 22(6)2020 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-33286390

RESUMO

Among all the methods of extracting randomness, quantum random number generators are promising for their genuine randomness. However, existing quantum random number generator schemes aim at generating sequences with a uniform distribution, which may not meet the requirements of specific applications such as a continuous-variable quantum key distribution system. In this paper, we demonstrate a practical quantum random number generation scheme directly generating Gaussian distributed random sequences based on measuring vacuum shot noise. Particularly, the impact of the sampling device in the practical system is analyzed. Furthermore, a related post-processing method, which maintains the fine distribution and autocorrelation properties of raw data, is exploited to extend the precision of generated Gaussian distributed random numbers to over 20 bits, making the sequences possible to be utilized by the following system with requiring high precision numbers. Finally, the results of normality and randomness tests prove that the generated sequences satisfy Gaussian distribution and can pass the randomness testing well.

19.
BMC Med Res Methodol ; 20(1): 175, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32611379

RESUMO

BACKGROUND: Examining residuals is a crucial step in statistical analysis to identify the discrepancies between models and data, and assess the overall model goodness-of-fit. In diagnosing normal linear regression models, both Pearson and deviance residuals are often used, which are equivalently and approximately standard normally distributed when the model fits the data adequately. However, when the response vari*able is discrete, these residuals are distributed far from normality and have nearly parallel curves according to the distinct discrete response values, imposing great challenges for visual inspection. METHODS: Randomized quantile residuals (RQRs) were proposed in the literature by Dunn and Smyth (1996) to circumvent the problems in traditional residuals. However, this approach has not gained popularity partly due to the lack of investigation of its performance for count regression including zero-inflated models through simulation studies. Therefore, we assessed the normality of the RQRs and compared their performance with traditional residuals for diagnosing count regression models through a series of simulation studies. A real data analysis in health care utilization study for modeling the number of repeated emergency department visits was also presented. RESULTS: Our results of the simulation studies demonstrated that RQRs have low type I error and great statistical power in comparisons to other residuals for detecting many forms of model misspecification for count regression models (non-linearity in covariate effect, over-dispersion, and zero inflation). Our real data analysis also showed that RQRs are effective in detecting misspecified distributional assumptions for count regression models. CONCLUSIONS: Our results for evaluating RQRs in comparison with traditional residuals provide further evidence on its advantages for diagnosing count regression models.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Simulação por Computador , Humanos , Modelos Lineares
20.
Artigo em Inglês | MEDLINE | ID: mdl-32153310

RESUMO

Field studies in ecology often make use of data collected in a hierarchical fashion, and may combine studies that vary in sampling design. For example, studies of tree recruitment after disturbance may use counts of individual seedlings from plots that vary in spatial arrangement and sampling density. To account for the multi-level design and the fact that more than a few plots usually yield no individuals, a mixed effects zero inflated Poisson model is often adopted. Although it is a convenient modeling strategy, various aspects of the model could be misspecified. A comprehensive test procedure, based on the cumulative sum of the residuals, is proposed. The test is proven to be consistent, and its convergence properties are established as well. The application of the proposed test is illustrated by a real data example and simulation studies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...