Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Stat Med ; 42(20): 3732-3744, 2023 09 10.
Article in English | MEDLINE | ID: mdl-37312237

ABSTRACT

In clinical and epidemiological research doubly truncated data often appear. This is the case, for instance, when the data registry is formed by interval sampling. Double truncation generally induces a sampling bias on the target variable, so proper corrections of ordinary estimation and inference procedures must be used. Unfortunately, the nonparametric maximum likelihood estimator of a doubly truncated distribution has several drawbacks, like potential nonexistence and nonuniqueness issues, or large estimation variance. Interestingly, no correction for double truncation is needed when the sampling bias is ignorable, which may occur with interval sampling and other sampling designs. In such a case the ordinary empirical distribution function is a consistent and fully efficient estimator that generally brings remarkable variance improvements compared to the nonparametric maximum likelihood estimator. Thus, identification of such situations is critical for the simple and efficient estimation of the target distribution. In this article, we introduce for the first time formal testing procedures for the null hypothesis of ignorable sampling bias with doubly truncated data. The asymptotic properties of the proposed test statistic are investigated. A bootstrap algorithm to approximate the null distribution of the test in practice is introduced. The finite sample performance of the method is studied in simulated scenarios. Finally, applications to data on onset for childhood cancer and Parkinson's disease are given. Variance improvements in estimation are discussed and illustrated.


Subject(s)
Algorithms , Research Design , Humans , Child , Selection Bias , Likelihood Functions , Computer Simulation , Bias
2.
Comput Methods Programs Biomed ; 217: 106694, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35278813

ABSTRACT

BACKGROUND AND OBJECTIVE: Nowadays the "low sample size, large dimension" scenario is often encountered in genetics and in the omic sciences, where the microarray data is typically formed by a large number of possibly dependent small samples. Standard methods to solve the k-sample problem in such a setting are of limited applicability due to lack of theoretical validation for large k, lengthy computational times, missing software solutions, or inability to deal with statistical dependence among the samples. This paper presents the R package Equalden.HD to overcome the referred limitations. METHODS: The package implements several tests for the null hypothesis that a large number of samples follow a common density. These methods are particularly well suited to the "low sample size, large dimension" setting. The implemented procedures allow for dependent samples. For each method Equalden.HD reports, among other things, the standardized value of the test statistic and the corresponding p-value. The package also includes two high-dimensional genetic data sets, Hedenfalk and Rat, which are used in this paper for illustration purposes. RESULTS: The usage of Equalden.HD has been illustrated through the analysis of Hedenfalk and Rat genetic data. Statistical dependence among the samples was found for both genetic data sets. The application of an appropriate k-sample test within Equalden.HD rejected the null hypothesis of inter-samples homogeneity. The methods were used to test for the within groups homogeneity in cluster analysis too, which is usually performed when the k samples are found to be significantly different. Equalden.HD helped to identify the individuals which are responsible for the lack of homogeneity of the samples. The limitations of the standard Kruskal-Wallis test for the identification of homogeneous clusters have been highlighted. CONCLUSIONS: The methods implemented by Equalden.HD are the unique omnibus nonparametric k-sample tests that have been validated as k grows. Furthermore, the package provides suitable corrections for possibly dependent samples, which is another distinctive feature. Thus, the package opens new doors for the statistical analysis of omic data. Limitations of standard methods (e.g. Anderson-Darling and Kruskal-Wallis) and existing software solutions in the setting with a large k have been emphasized.


Subject(s)
Software , Animals , Cluster Analysis , Rats , Sample Size
3.
BMJ Evid Based Med ; 26(3): 121-126, 2021 Jun.
Article in English | MEDLINE | ID: mdl-31988195

ABSTRACT

When analysing and presenting results of randomised clinical trials, trialists rarely report if or how underlying statistical assumptions were validated. To avoid data-driven biased trial results, it should be common practice to prospectively describe the assessments of underlying assumptions. In existing literature, there is no consensus on how trialists should assess and report underlying assumptions for the analyses of randomised clinical trials. With this study, we developed suggestions on how to test and validate underlying assumptions behind logistic regression, linear regression, and Cox regression when analysing results of randomised clinical trials.Two investigators compiled an initial draftbased on a review of the literature. Experienced statisticians and trialists from eight different research centres and trial units then participated in a anonymised consensus process, where we reached agreement on the suggestions presented in this paper.This paper provides detailed suggestions on 1) which underlying statistical assumptions behind logistic regression, multiple linear regression and Cox regression each should be assessed; 2) how these underlying assumptions may be assessed; and 3) what to do if these assumptions are violated.We believe that the validity of randomised clinical trial results will increase if our recommendations for assessing and dealing with violations of the underlying statistical assumptions are followed.


Subject(s)
Research Design , Humans , Randomized Controlled Trials as Topic
4.
Biom J ; 62(3): 852-867, 2020 05.
Article in English | MEDLINE | ID: mdl-31919875

ABSTRACT

Registry data typically report incident cases within a certain calendar time interval. Such interval sampling induces double truncation on the incidence times, which may result in an observational bias. In this paper, we introduce nonparametric estimation for the cumulative incidences of competing risks when the incidence time is doubly truncated. Two different estimators are proposed depending on whether the truncation limits are independent of the competing events or not. The asymptotic properties of the estimators are established, and their finite sample performance is investigated through simulations. For illustration purposes, the estimators are applied to childhood cancer registry data, where the target population is peculiarly defined conditional on future cancer development. Then, in our application, the cumulative incidences inform on the distribution by age of the different types of cancer.


Subject(s)
Biometry/methods , Statistics, Nonparametric , Adult , Age Distribution , Aged , Female , Humans , Incidence , Male , Middle Aged , Neoplasms/epidemiology , Risk , Sample Size
5.
BMJ Evid Based Med ; 24(5): 185-189, 2019 Oct.
Article in English | MEDLINE | ID: mdl-30948454

ABSTRACT

In order to ensure the validity of results of randomised clinical trials and under some circumstances to optimise statistical power, most statistical methods require validation of underlying statistical assumptions. The present paper describes how trialists in major medical journals report tests of underlying statistical assumptions when analysing results of randomised clinical trials. We also consider possible solutions how to improve current practice by adequate reporting of tests of underlying statistical assumptions. We conclude that there is a need to reach consensus on which underlying assumptions should be assessed, how these underlying assumptions should be assessed and what should be done if the underlying assumptions are violated.


Subject(s)
Data Interpretation, Statistical , Randomized Controlled Trials as Topic/methods , Humans , Reproducibility of Results , Statistics as Topic
7.
Biom J ; 61(2): 424-441, 2019 03.
Article in English | MEDLINE | ID: mdl-30589104

ABSTRACT

Next-generation sequencing (NGS) experiments are often performed in biomedical research nowadays, leading to methodological challenges related to the high-dimensional and complex nature of the recorded data. In this work we review some of the issues that arise in disorder detection from NGS experiments, that is, when the focus is the detection of deletion and duplication disorders for homozygosity and heterozygosity in DNA sequencing. A statistical model to cope with guanine/cytosine bias and phasing and prephasing phenomena at base level is proposed, and a goodness-of-fit procedure for disorder detection is derived. The method combines the proper evaluation of local p-values (one for each DNA base) with suitable corrections for multiple comparisons and the discrete nature of the p-values. A global test for the detection of disorders in the whole DNA region is proposed too. The performance of the introduced procedures is investigated through simulations. A real data illustration is provided.


Subject(s)
Biostatistics/methods , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Heterozygote , Homozygote , Models, Statistical , Monte Carlo Method
8.
Biometrics ; 74(4): 1203-1212, 2018 12.
Article in English | MEDLINE | ID: mdl-29603718

ABSTRACT

Nonparametric estimation of the transition probability matrix of a progressive multi-state model is considered under cross-sectional sampling. Two different estimators adapted to possibly right-censored and left-truncated data are proposed. The estimators require full retrospective information before the truncation time, which, when exploited, increases efficiency. They are obtained as differences between two survival functions constructed for sub-samples of subjects occupying specific states at a certain time point. Both estimators correct the oversampling of relatively large survival times by using the left-truncation times associated with the cross-sectional observation. Asymptotic results are established, and finite sample performance is investigated through simulations. One of the proposed estimators performs better when there is no censoring, while the second one is strongly recommended with censored data. The new estimators are applied to data on patients in intensive care units (ICUs).


Subject(s)
Biometry/methods , Statistics as Topic/methods , Acute Disease/mortality , Acute Disease/therapy , Computer Simulation , Cross-Sectional Studies , Humans , Intensive Care Units , Time Factors
9.
Biometrics ; 74(2): 481-487, 2018 06.
Article in English | MEDLINE | ID: mdl-28886206

ABSTRACT

Doubly truncated data arise when event times are observed only if they fall within subject-specific, possibly random, intervals. While non-parametric methods for survivor function estimation using doubly truncated data have been intensively studied, only a few methods for fitting regression models have been suggested, and only for a limited number of covariates. In this article, we present a method to fit the Cox regression model to doubly truncated data with multiple discrete and continuous covariates, and describe how to implement it using existing software. The approach is used to study the association between candidate single nucleotide polymorphisms and age of onset of Parkinson's disease.


Subject(s)
Biometry/methods , Parkinson Disease/genetics , Proportional Hazards Models , Age of Onset , Humans , Polymorphism, Single Nucleotide , Probability , Regression Analysis , Software
10.
ACS Appl Mater Interfaces ; 9(31): 26372-26382, 2017 Aug 09.
Article in English | MEDLINE | ID: mdl-28721722

ABSTRACT

Novel plasmonic thin films based on electrostatic layer-by-layer (LbL) deposition of citrate-stabilized Au nanoparticles (NPs) and ammonium pillar[5]arene (AP[5]A) have been developed. The supramolecular-induced LbL assembly of the plasmonic nanoparticles yields the formation of controlled hot spots with uniform interparticle distances. At the same time, this strategy allows modulating the density and dimensions of the Au aggregates, and therefore the optical response, on the thin film with the number of AuNP-AP[5]A deposition cycles. Characterization of the AuNP-AP[5]A hybrid platforms as a function of the deposition cycles was performed by means of visible-NIR absorption spectroscopy, and scanning electron and atomic force microscopies, showing larger aggregates with the number of cycles. Additionally, the surface enhanced Raman scattering efficiency of the resulting AuNP-AP[5]A thin films has been investigated for three different laser excitations (633, 785, and 830 nm) and using pyrene as Raman probe. The best performance was shown by the AuNP-AP[5]A film obtained with two deposition cycles ((AuNP-AP[5]A)2) when excited with a 785 laser line. The optical response and SERS efficiency of the thin films were also simulated using the M3 solver and employing computer aided design models built based on SEM images of the different films. The use of host molecules as building blocks to fabricate (AuNP-AP[5]A)2) films has enabled the ultradetection, in liquid and gas phase, of low molecular weight polyaromatic hydrocarbons, PAHs, with no affinity for gold but toward the hydrophobic AP[5]A cavity. Besides, these plasmonic platforms allowed achieving quantitative detection within certain concentration regimes. Finally, the multiplex sensing capabilities of the AuNP-AP[5]A)2 were evaluated for their ability to detect in liquid and gas phase three different PAHs.

11.
Stat Med ; 36(12): 1964-1976, 2017 05 30.
Article in English | MEDLINE | ID: mdl-28238225

ABSTRACT

In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness-death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score equations that are able to remove the bias due to censoring are introduced. By solving these equations, one can estimate the possibly time-varying regression coefficients, which have an immediate interpretation as covariate effects on the transition probabilities. The performance of the proposed estimator is investigated through simulations. We apply the method to data from the Registry of Systematic Lupus Erythematosus RELESSER, a multicenter registry created by the Spanish Society of Rheumatology. Specifically, we investigate the effect of age at Lupus diagnosis, sex, and ethnicity on the probability of damage and death along time. Copyright © 2017 John Wiley & Sons, Ltd.


Subject(s)
Disease Progression , Models, Statistical , Mortality , Regression Analysis , Age Factors , Bias , Female , Humans , Lupus Erythematosus, Systemic/mortality , Lupus Erythematosus, Systemic/pathology , Male , Middle Aged , Probability , Registries , Risk Assessment , Sex Factors , Survival Analysis
12.
Stat Methods Med Res ; 26(5): 2356-2375, 2017 Oct.
Article in English | MEDLINE | ID: mdl-26265767

ABSTRACT

The sequential goodness-of-fit (SGoF) multiple testing method has recently been proposed as an alternative to the familywise error rate- and the false discovery rate-controlling procedures in high-dimensional problems. For discrete data, the SGoF method may be very conservative. In this paper, we introduce an alternative SGoF-type procedure that takes into account the discreteness of the test statistics. Like the original SGoF, our new method provides weak control of the false discovery rate/familywise error rate but attains false discovery rate levels closer to the desired nominal level, and thus it is more powerful. We study the performance of this method in a simulation study and illustrate its application to a real pharmacovigilance data set.


Subject(s)
Data Interpretation, Statistical , Humans , Models, Statistical , Monte Carlo Method , Statistics as Topic
13.
Rheumatology (Oxford) ; 55(7): 1243-50, 2016 07.
Article in English | MEDLINE | ID: mdl-27018057

ABSTRACT

OBJECTIVES: To identify patterns (clusters) of damage manifestations within a large cohort of SLE patients and evaluate the potential association of these clusters with a higher risk of mortality. METHODS: This is a multicentre, descriptive, cross-sectional study of a cohort of 3656 SLE patients from the Spanish Society of Rheumatology Lupus Registry. Organ damage was ascertained using the Systemic Lupus International Collaborating Clinics Damage Index. Using cluster analysis, groups of patients with similar patterns of damage manifestations were identified. Then, overall clusters were compared as well as the subgroup of patients within every cluster with disease duration shorter than 5 years. RESULTS: Three damage clusters were identified. Cluster 1 (80.6% of patients) presented a lower amount of individuals with damage (23.2 vs 100% in clusters 2 and 3, P < 0.001). Cluster 2 (11.4% of patients) was characterized by musculoskeletal damage in all patients. Cluster 3 (8.0% of patients) was the only group with cardiovascular damage, and this was present in all patients. The overall mortality rate of patients in clusters 2 and 3 was higher than that in cluster 1 (P < 0.001 for both comparisons) and in patients with disease duration shorter than 5 years as well. CONCLUSION: In a large cohort of SLE patients, cardiovascular and musculoskeletal damage manifestations were the two dominant forms of damage to sort patients into clinically meaningful clusters. Both in early and late stages of the disease, there was a significant association of these clusters with an increased risk of mortality. Physicians should pay special attention to the early prevention of damage in these two systems.


Subject(s)
Cardiovascular Diseases/mortality , Lupus Erythematosus, Systemic/complications , Lupus Erythematosus, Systemic/mortality , Musculoskeletal Diseases/mortality , Severity of Illness Index , Adult , Cardiovascular Diseases/etiology , Cluster Analysis , Cross-Sectional Studies , Female , Humans , Lupus Erythematosus, Systemic/pathology , Male , Middle Aged , Musculoskeletal Diseases/etiology , Registries , Spain , Time Factors
14.
Biometrics ; 71(2): 364-75, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25735883

ABSTRACT

Multi-state models are often used for modeling complex event history data. In these models the estimation of the transition probabilities is of particular interest, since they allow for long-term predictions of the process. These quantities have been traditionally estimated by the Aalen-Johansen estimator, which is consistent if the process is Markov. Several non-Markov estimators have been proposed in the recent literature, and their superiority with respect to the Aalen-Johansen estimator has been proved in situations in which the Markov condition is strongly violated. However, the existing estimators have the drawback of requiring that the support of the censoring distribution contains the support of the lifetime distribution, which is not often the case. In this article, we propose two new methods for estimating the transition probabilities in the progressive illness-death model. Some asymptotic results are derived. The proposed estimators are consistent regardless the Markov condition and the referred assumption about the censoring support. We explore the finite sample behavior of the estimators through simulations. The main conclusion of this piece of research is that the proposed estimators are much more efficient than the existing non-Markov estimators in most cases. An application to a clinical trial on colon cancer is included. Extensions to progressive processes beyond the three-state illness-death model are discussed.


Subject(s)
Statistics, Nonparametric , Survival Analysis , Algorithms , Biometry , Colonic Neoplasms/mortality , Colonic Neoplasms/surgery , Computer Simulation , Humans , Kaplan-Meier Estimate , Markov Chains , Models, Statistical , Probability , Stochastic Processes
15.
Biom J ; 57(1): 108-22, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25323102

ABSTRACT

In the field of multiple comparison procedures, adjusted p-values are an important tool to evaluate the significance of a test statistic while taking the multiplicity into account. In this paper, we introduce adjusted p-values for the recently proposed Sequential Goodness-of-Fit (SGoF) multiple test procedure by letting the level of the test vary on the unit interval. This extends previous research on the SGoF method, which is a method of high interest when one aims to increase the statistical power in a multiple testing scenario. The adjusted p-value is the smallest level at which the SGoF procedure would still reject the given null hypothesis, while controlling for the multiplicity of tests. The main properties of the adjusted p-values are investigated. In particular, we show that they are a subset of the original p-values, being equal to 1 for p-values above a certain threshold. These are very useful properties from a numerical viewpoint, since they allow for a simplified method to compute the adjusted p-values. We introduce a modification of the SGoF method, termed majorant version, which rejects the null hypotheses with adjusted p-values below the level. This modification rejects more null hypotheses as the level increases, something which is not in general the case for the original SGoF. Adjusted p-values for the conservative version of the SGoF procedure, which estimates the variance without assuming that all the null hypotheses are true, are also included. The situation with ties among the p-values is discussed too. Several real data applications are investigated to illustrate the practical usage of adjusted p-values, ranging from a small to a large number of tests.


Subject(s)
Biometry/methods , Animals , Child , Environmental Exposure/adverse effects , Gene Expression Profiling , Humans , Lead/adverse effects , Myocardial Infarction/therapy , Mytilus edulis/genetics , Neuropsychology , Normal Distribution
16.
Biom J ; 55(1): 52-67, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23225621

ABSTRACT

In this paper, we introduce a new estimator of a percentile residual life function with censored data under a monotonicity constraint. Specifically, it is assumed that the percentile residual life is a decreasing function. This assumption is useful when estimating the percentile residual life of units, which degenerate with age. We establish a law of the iterated logarithm for the proposed estimator, and its n-equivalence to the unrestricted estimator. The asymptotic normal distribution of the estimator and its strong approximation to a Gaussian process are also established. We investigate the finite sample performance of the monotone estimator in an extensive simulation study. Finally, data from a clinical trial in primary biliary cirrhosis of the liver are analyzed with the proposed methods. One of the conclusions of our work is that the restricted estimator may be much more efficient than the unrestricted one.


Subject(s)
Biometry/methods , Survival Analysis , Clinical Trials as Topic , Humans , Liver Cirrhosis, Biliary/epidemiology , Normal Distribution , Stochastic Processes
17.
Stat Med ; 31(30): 4416-27, 2012 Dec 30.
Article in English | MEDLINE | ID: mdl-22975898

ABSTRACT

Multistate models are useful tools for modeling disease progression when survival is the main outcome, but several intermediate events of interest are observed during the follow-up time. The illness-death model is a special multistate model with important applications in the biomedical literature. It provides a suitable representation of the individual's history when a unique intermediate event can be experienced before the main event of interest. Nonparametric estimation of transition probabilities in this and other multistate models is usually performed through the Aalen-Johansen estimator under a Markov assumption. The Markov assumption claims that given the present state, the future evolution of the illness is independent of the states previously visited and the transition times among them. However, this assumption fails in some applications, leading to inconsistent estimates. In this paper, we provide a new approach for testing Markovianity in the illness-death model. The new method is based on measuring the future-past association along time. This results in a detailed inspection of the process, which often reveals a non-Markovian behavior with different trends in the association measure. A test of significance for zero future-past association at each time point is introduced, and a significance trace is proposed accordingly. Besides, we propose a global test for Markovianity based on a supremum-type test statistic. The finite sample performance of the test is investigated through simulations. We illustrate the new method through the analysis of two biomedical data analysis.


Subject(s)
Biometry/methods , Disease Progression , Markov Chains , Monte Carlo Method , Survival Analysis , Bone Marrow Transplantation , Computer Simulation , Humans , Leukemia, Myeloid, Acute/mortality , Leukemia, Myeloid, Acute/therapy , Models, Biological , Multicenter Studies as Topic/statistics & numerical data , Statistics, Nonparametric , Treatment Outcome
18.
Stat Appl Genet Mol Biol ; 11(3): Article 14, 2012.
Article in English | MEDLINE | ID: mdl-22611594

ABSTRACT

In this paper a correction of SGoF multitesting method for dependent tests is introduced. The correction is based in the beta-binomial model, and therefore the new method is called Beta- Binomial SGoF (or BB-SGoF). Main properties of the new method are established, and its practical implementation is discussed. BB-SGoF is illustrated through the analysis of two different real data sets on gene/protein expression levels. The performance of the method is investigated through simulations too. One of the main conclusions of the paper is that SGoF strategy may have much power even in the presence of possible dependences among the tests.


Subject(s)
Gene Expression Profiling/methods , Models, Statistical , Algorithms , Animals , Computer Simulation , Female , Humans , Male
19.
Biom J ; 54(2): 163-80, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22522376

ABSTRACT

The three-state progressive model is a special multi-state model with important applications in Survival Analysis. It provides a suitable representation of the individual's history when an intermediate event (with a possible influence on the survival prognosis) is experienced before the main event of interest. Estimation of transition probabilities in this and other multi-state models is usually performed through the Aalen-Johansen estimator. However, Aalen-Johansen may be biased when the underlying process is not Markov. In this paper, we provide a new approach for testing Markovianity in the three-state progressive model. The new method is based on measuring the future-past association along time. This results in a deep inspection of the process that often reveals a non-Markovian behaviour with different trends in the association measure. A test of significance for zero future-past association at each time point is introduced, and a significance trace is proposed accordingly. The finite sample performance of the test is investigated through simulations. We illustrate the new method through real data analysis.


Subject(s)
Disease Progression , Markov Chains , Models, Statistical , Humans , Survival Analysis , Time Factors
20.
PLoS One ; 6(9): e24700, 2011.
Article in English | MEDLINE | ID: mdl-21931819

ABSTRACT

We developed a new multiple hypothesis testing adjustment called SGoF+ implemented as a sequential goodness of fit metatest which is a modification of a previous algorithm, SGoF, taking advantage of the information of the distribution of p-values in order to fix the rejection region. The new method uses a discriminant rule based on the maximum distance between the uniform distribution of p-values and the observed one, to set the null for a binomial test. This new approach shows a better power/pFDR ratio than SGoF. In fact SGoF+ automatically sets the threshold leading to the maximum power and the minimum false non-discovery rate inside the SGoF' family of algorithms. Additionally, we suggest combining the information provided by SGoF+ with the estimate of the FDR that has been committed when rejecting a given set of nulls. We study different positive false discovery rate, pFDR, estimation methods to combine q-value estimates jointly with the information provided by the SGoF+ method. Simulations suggest that the combination of SGoF+ metatest with the q-value information is an interesting strategy to deal with multiple testing issues. These techniques are provided in the latest version of the SGoF+ software freely available at http://webs.uvigo.es/acraaj/SGoF.htm.


Subject(s)
Algorithms , Software , Computational Biology
SELECTION OF CITATIONS
SEARCH DETAIL
...