Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 129
Filter
1.
Biom J ; 66(5): e202300197, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38953619

ABSTRACT

In biomedical research, the simultaneous inference of multiple binary endpoints may be of interest. In such cases, an appropriate multiplicity adjustment is required that controls the family-wise error rate, which represents the probability of making incorrect test decisions. In this paper, we investigate two approaches that perform single-step p $p$ -value adjustments that also take into account the possible correlation between endpoints. A rather novel and flexible approach known as multiple marginal models is considered, which is based on stacking of the parameter estimates of the marginal models and deriving their joint asymptotic distribution. We also investigate a nonparametric vector-based resampling approach, and we compare both approaches with the Bonferroni method by examining the family-wise error rate and power for different parameter settings, including low proportions and small sample sizes. The results show that the resampling-based approach consistently outperforms the other methods in terms of power, while still controlling the family-wise error rate. The multiple marginal models approach, on the other hand, shows a more conservative behavior. However, it offers more versatility in application, allowing for more complex models or straightforward computation of simultaneous confidence intervals. The practical application of the methods is demonstrated using a toxicological dataset from the National Toxicology Program.


Subject(s)
Biomedical Research , Biometry , Models, Statistical , Biometry/methods , Biomedical Research/methods , Sample Size , Endpoint Determination , Humans
3.
Article in English | MEDLINE | ID: mdl-38200715

ABSTRACT

Out of the 166 articles published in Journal of Industrial Microbiology and Biotechnology (JIMB) in 2019-2020 (not including special issues or review articles), 51 of them used a statistical test to compare two or more means. The most popular test was the (Standard) t-test, which often was used to compare several pairs of means. Other statistical procedures used included Fisher's least significant difference (LSD), Tukey's honest significant difference (HSD), and Welch's t-test; and to a lesser extent Bonferroni, Duncan's Multiple Range, Student-Newman-Keuls, and Kruskal-Wallis tests. This manuscript examines the performance of some of these tests with simulated experimental data, typical of those reported by JIMB authors. The results show that many of the most common procedures used by JIMB authors result in statistical conclusions that are prone to have large false positive (Type I) errors. These error-prone procedures included the multiple t-test, multiple Welch's t-test, and Fisher's LSD. These multiple comparisons procedures were compared with alternatives (Fisher-Hayter, Tukey's HSD, Bonferroni, and Dunnett's t-test) that were able to better control Type I errors. NON-TECHNICAL SUMMARY: The aim of this work was to review and recommend statistical procedures for Journal of Industrial Microbiology and Biotechnology authors who often compare the effect of several treatments on microorganisms and their functions.


Subject(s)
Industrial Microbiology , Periodicals as Topic
4.
Biom J ; 66(1): e2300077, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37857533

ABSTRACT

P-values that are derived from continuously distributed test statistics are typically uniformly distributed on (0,1) under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a p-value P (meaning that P is under the null hypothesis stochastically larger than uniform on (0,1)) can occur if the test statistic from which P is derived is discrete, or if the true parameter value under the null is not an LFC. To deal with both of these sources of conservativeness, we present two approaches utilizing randomized p-values. We illustrate their effectiveness for testing a composite null hypothesis under a binomial model. We also give an example of how the proposed p-values can be used to test a composite null in group testing designs. We find that the proposed randomized p-values are less conservative compared to nonrandomized p-values under the null hypothesis, but that they are stochastically not smaller under the alternative. The problem of establishing the validity of randomized p-values has received attention in previous literature. We show that our proposed randomized p-values are valid under various discrete statistical models, which are such that the distribution of the corresponding test statistic belongs to an exponential family. The behavior of the power function for the tests based on the proposed randomized p-values as a function of the sample size is also investigated. Simulations and a real data example are used to compare the different considered p-values.


Subject(s)
Models, Statistical , Sample Size
5.
J Appl Stat ; 50(15): 3142-3156, 2023.
Article in English | MEDLINE | ID: mdl-37969545

ABSTRACT

Although there have been a number of available tests of bivariate exchangeability, i.e. bivariate symmetry for bivariate distributions, the literature is void of tests whether a multivariate distribution with more than two dimensions is exchangeable or not. In this paper, multivariate permutation tests of exchangeability of multivariate distributions are proposed, which are based on the non-parametric combination methodology, i.e. on combining non-parametric bivariate exchangeability tests. Numerical experiments on real as well as simulated multivariate data with more than two dimensions are presented here. The multivariate permutation test turns out to be typically more powerful than a bivariate exchangeability test performed only over a single pair of variables, and also more suitable compared to tests exploiting the approaches of Benjamini-Yekutieli or Bonferroni.

6.
Brain Behav ; 13(12): e3327, 2023 12.
Article in English | MEDLINE | ID: mdl-37961043

ABSTRACT

OBJECTIVE: Cortical gray matter (GM) atrophy plays a central role in multiple sclerosis (MS) pathology. However, it is not commonly assessed in clinical routine partly because a number of methodological problems hamper the development of a robust biomarker to quantify GM atrophy. In previous work, we have demonstrated the clinical utility of the "mosaic approach" (MAP) to assess individual GM atrophy in the motor neuron disease spectrum and frontotemporal dementia. In this study, we investigated the clinical utility of MAP in MS, comparing this novel biomarker to existing methods for computing GM atrophy in single patients. We contrasted the strategies based on correlations with established biomarkers reflecting MS disease burden. METHODS: We analyzed T1-weighted MPRAGE magnetic resonance imaging data from 465 relapsing-remitting MS patients and 89 healthy controls. We inspected how variations of existing strategies to estimate individual GM atrophy ("standard approaches") as well as variations of MAP (i.e., different parcellation schemes) impact downstream analysis results, both on a group and an individual level. We interpreted individual cortical disease burden as single metric reflecting the fraction of significantly atrophic data points with respect to the control group. In addition, we evaluated the correlations to lesion volume (LV) and Expanded Disability Status Scale (EDSS). RESULTS: We found that the MAP method yielded highest correlations with both LV and EDSS as compared to all other strategies. Although the parcellation resolution played a minor role in terms of absolute correlations with clinical variables, higher resolutions provided more clearly defined statistical brain maps which may facilitate clinical interpretability. CONCLUSION: This study provides evidence that MAP yields high potential for a clinically relevant biomarker in MS, outperforming existing methods to compute cortical disease burden in single patients. Of note, MAP outputs brain maps illustrating individual cortical disease burden which can be directly interpreted in daily clinical routine.


Subject(s)
Multiple Sclerosis, Relapsing-Remitting , Multiple Sclerosis , Neurodegenerative Diseases , Humans , Multiple Sclerosis/diagnostic imaging , Multiple Sclerosis/pathology , Multiple Sclerosis, Relapsing-Remitting/pathology , Magnetic Resonance Imaging/methods , Gray Matter/diagnostic imaging , Gray Matter/pathology , Atrophy/pathology , Biomarkers , Brain/diagnostic imaging , Brain/pathology
7.
Eur Urol Focus ; 9(5): 701-704, 2023 09.
Article in English | MEDLINE | ID: mdl-37925328

ABSTRACT

Network meta-analysis (NMA) expands upon traditional meta-analysis by integrating three or more interventions. This allows comparing interventions using evidence from trials that have compared pairs of interventions directly, and indirect evidence through common comparators. We provide an overview of NMA concepts and considerations when interpreting results from a systematic review with a NMA and applying them to clinical practice. PATIENT SUMMARY: Network meta-analysis is a statistical tool that allows researchers to compare multiple treatments for a medical condition at once, even when treatments have not been compared to each other in research studies. This mini-review explains how to read a network meta-analysis and apply its results in patient care.


Subject(s)
Network Meta-Analysis , Systematic Reviews as Topic , Meta-Analysis as Topic
8.
Netw Neurosci ; 7(2): 389-410, 2023.
Article in English | MEDLINE | ID: mdl-37397879

ABSTRACT

We describe how the recently introduced method of significant subgraph mining can be employed as a useful tool in neural network comparison. It is applicable whenever the goal is to compare two sets of unweighted graphs and to determine differences in the processes that generate them. We provide an extension of the method to dependent graph generating processes as they occur, for example, in within-subject experimental designs. Furthermore, we present an extensive investigation of the error-statistical properties of the method in simulation using Erdos-Rényi models and in empirical data in order to derive practical recommendations for the application of subgraph mining in neuroscience. In particular, we perform an empirical power analysis for transfer entropy networks inferred from resting-state MEG data comparing autism spectrum patients with neurotypical controls. Finally, we provide a Python implementation as part of the openly available IDTxl toolbox.

9.
Drug Chem Toxicol ; : 1-12, 2023 Jul 25.
Article in English | MEDLINE | ID: mdl-37491899

ABSTRACT

Ciprofloxacin (CFX) and ofloxacin (OFX) are commonly found as residual contaminants in aquatic environments, posing potential risks to various species. To ensure the safety of aquatic wildlife, it is essential to determine the toxicity of these antibiotics and establish appropriate concentration limits. Additionally, in (eco)toxicological studies, addressing the issue of multiple hypothesis testing through p-value adjustments is crucial for robust decision-making. In this study, we assessed the no observed adverse effect concentration (NOAEC) of CFX and OFX on Moina macrocopa across a concentration range of 0-400 µg L-1. Furthermore, we investigated multiple p-value adjustments to determine the NOAECs. Our analysis yielded consistent results across seven different p-value adjustments, indicating NOAECs of 100 µg CFX L-1 for age at first reproduction and 200 µg CFX L-1 for fertility. For OFX treatment, a NOAEC of 400 µg L-1 was observed for both biomarkers. However, further investigation is required to establish the NOAEC of OFX at higher concentrations with greater certainty. Our findings demonstrate that CFX exhibits higher toxicity compared to OFX, consistent with previous research. Moreover, this study highlights the differential performance of p-value adjustment methods in terms of maintaining statistical power while controlling the multiplicity problem, and their practical applicability. The study emphasizes the low NOAECs for these antibiotics in the zooplanktonic group, highlighting their significant risks to ecological and environmental safety. Additionally, our investigation of p-value adjustment approaches contributes to a deeper understanding of their performance characteristics, enabling (eco)toxicologists to select appropriate methods based on their specific needs and priorities.

10.
BMC Med Res Methodol ; 23(1): 153, 2023 06 29.
Article in English | MEDLINE | ID: mdl-37386403

ABSTRACT

BACKGROUND: The rule of thumb that there is little gain in statistical power by obtaining more than 4 controls per case, is based on type-1 error α = 0.05. However, association studies that evaluate thousands or millions of associations use smaller α and may have access to plentiful controls. We investigate power gains, and reductions in p-values, when increasing well beyond 4 controls per case, for small α. METHODS: We calculate the power, the median expected p-value, and the minimum detectable odds-ratio (OR), as a function of the number of controls/case, as α decreases. RESULTS: As α decreases, at each ratio of controls per case, the increase in power is larger than for α = 0.05. For α between 10-6 and 10-9 (typical for thousands or millions of associations), increasing from 4 controls per case to 10-50 controls per case increases power. For example, a study with power = 0.2 (α = 5 × 10-8) with 1 control/case has power = 0.65 with 4 controls/case, but with 10 controls/case has power = 0.78, and with 50 controls/case has power = 0.84. For situations where obtaining more than 4 controls per case provides small increases in power beyond 0.9 (at small α), the expected p-value can decrease by orders-of-magnitude below α. Increasing from 1 to 4 controls/case reduces the minimum detectable OR toward the null by 20.9%, and from 4 to 50 controls/case reduces by an additional 9.7%, a result which applies regardless of α and hence also applies to "regular" α = 0.05 epidemiology. CONCLUSIONS: At small α, versus 4 controls/case, recruiting 10 or more controls/cases can increase power, reduce the expected p-value by 1-2 orders of magnitude, and meaningfully reduce the minimum detectable OR. These benefits of increasing the controls/case ratio increase as the number of cases increases, although the amount of benefit depends on exposure frequencies and true OR. Provided that controls are comparable to cases, our findings suggest greater sharing of comparable controls in large-scale association studies.


Subject(s)
Control Groups , Odds Ratio , Research Design , Humans
11.
Front Artif Intell ; 6: 1123285, 2023.
Article in English | MEDLINE | ID: mdl-37077235

ABSTRACT

COVID-19 is an unprecedented global pandemic with a serious negative impact on virtually every part of the world. Although much progress has been made in preventing and treating the disease, much remains to be learned about how best to treat the disease while considering patient and disease characteristics. This paper reports a case study of combinatorial treatment selection for COVID-19 based on real-world data from a large hospital in Southern China. In this observational study, 417 confirmed COVID-19 patients were treated with various combinations of drugs and followed for four weeks after discharge (or until death). Treatment failure is defined as death during hospitalization or recurrence of COVID-19 within four weeks of discharge. Using a virtual multiple matching method to adjust for confounding, we estimate and compare the failure rates of different combinatorial treatments, both in the whole study population and in subpopulations defined by baseline characteristics. Our analysis reveals that treatment effects are substantial and heterogeneous, and that the optimal combinatorial treatment may depend on baseline age, systolic blood pressure, and c-reactive protein level. Using these three variables to stratify the study population leads to a stratified treatment strategy that involves several different combinations of drugs (for patients in different strata). Our findings are exploratory and require further validation.

12.
Contemp Clin Trials ; 129: 107185, 2023 06.
Article in English | MEDLINE | ID: mdl-37059263

ABSTRACT

BACKGROUND: In confirmatory clinical trials, it is critical to have appropriate control of multiplicity for multiple comparisons or endpoints. When multiplicity-related issues arise from different sources (e.g., multiple endpoints, multiple treatment arms, multiple interim data-cuts and other factors), it can become complicated to control the family-wise type I error rate (FWER). Therefore, it is crucial for statisticians to fully understand the multiplicity adjustment methods and the objectives of the analysis regarding study power, sample size and feasibility in order to identify the proper multiplicity adjustment strategy. METHODS: In the context of multiplicity adjustment of multiple dose levels and multiple endpoints in a confirmatory trial, we proposed a modified truncated Hochberg procedure in combination with a fixed-sequence hierarchical testing procedure to strongly control the FWER. In this paper, we provided a brief review of the mathematical framework of the regular Hochberg procedure, the truncated Hochberg procedure and the proposed modified truncated Hochberg procedure. An ongoing phase 3 confirmatory trial for pediatric functional constipation was used as a real case application to illustrate how the proposed modified truncated Hochberg procedure will be implemented. A simulation study was conducted to demonstrate that the study was adequately powered and the FWER was strongly controlled. CONCLUSION: This work is expected to facilitate the understanding and selection of adjustment methods for statisticians.


Subject(s)
Research Design , Humans , Child , Data Interpretation, Statistical , Computer Simulation , Sample Size
13.
Brain Behav ; 13(4): e2865, 2023 04.
Article in English | MEDLINE | ID: mdl-36869597

ABSTRACT

INTRODUCTION: The false discovery rate (FDR) procedure does not incorporate the geometry of the random field and requires high statistical power at each voxel, a requirement not satisfied by the limited number of participants in imaging studies. Topological FDR, threshold free cluster enhancement (TFCE), and probabilistic TFCE improve statistical power by incorporating local geometry. However, topological FDR requires specifying a cluster defining threshold and TFCE requires specifying transformation weights. METHODS: Geometry-derived statistical significance (GDSS) procedure overcomes these limitations by combining voxelwise p-values for the test statistic with the probabilities computed from the local geometry for the random field, thereby providing substantially greater statistical power than the procedures currently used to control for multiple comparisons. We use synthetic data and real-world data to compare its performance against the performance of these other, previously developed procedures. RESULTS: GDSS provided substantially greater statistical power relative to the comparator procedures, which was less variable to the number of participants. GDSS was more conservative than TFCE: that is, it rejected null hypotheses at voxels with much higher effect sizes than TFCE. Our experiments also showed that the Cohen's D effect size decreases as the number of participants increases. Therefore, sample size calculations from small studies may underestimate the participants required in larger studies. Our findings also suggest effect size maps should be presented along with p-value maps for correct interpretation of findings. CONCLUSIONS: GDSS compared with the other procedures provides considerably greater statistical power for detecting true positives while limiting false positives, especially in small sized (<40 participants) imaging cohorts.


Subject(s)
Brain Mapping , Brain , Magnetic Resonance Imaging , Humans , Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , Probability
14.
Biometrics ; 79(2): 1114-1118, 2023 06.
Article in English | MEDLINE | ID: mdl-35355244

ABSTRACT

Hung et al. (2007) considered the problem of controlling the type I error rate for a primary and secondary endpoint in a clinical trial using a gatekeeping approach in which the secondary endpoint is tested only if the primary endpoint crosses its monitoring boundary. They considered a two-look trial and showed by simulation that the naive method of testing the secondary endpoint at full level α at the time the primary endpoint reaches statistical significance does not control the familywise error rate at level α. Tamhane et al. (2010) derived analytic expressions for familywise error rate and power and confirmed the inflated error rate of the naive approach. Nonetheless, many people mistakenly believe that the closure principle can be used to prove that the naive procedure controls the familywise error rate. The purpose of this note is to explain in greater detail why there is a problem with the naive approach and show that the degree of alpha inflation can be as high as that of unadjusted monitoring of a single endpoint.


Subject(s)
Models, Statistical , Research Design , Humans , Endpoint Determination/methods , Computer Simulation , Sample Size
15.
Ther Innov Regul Sci ; 57(2): 304-315, 2023 03.
Article in English | MEDLINE | ID: mdl-36280651

ABSTRACT

When simultaneous comparisons are performed, a procedure must be employed to control the overall level (also known as the Type I Error rate). Hochberg's stepwise testing procedure is often used and here determination of the sample size needed to achieve a specified power for two pairwise comparisons when observations follow a normal distribution is addressed. Three different scenarios are considered: subsets defined by a baseline criterion, two treatments compared to a control, or one set of subjects nested within the other. The solutions for these three scenarios differ and are examined. The sample sizes for the differences in success probabilities for binomial distributions are presented using the asymptotic normality. The sample sizes and power using Hochberg's procedure are compared to the corresponding results using the Bonferroni approach.


Subject(s)
Research Design , Humans , Sample Size
16.
Rev. méd. (La Paz) ; 29(2): 80-85, 2023.
Article in Spanish | LILACS | ID: biblio-1530250

ABSTRACT

Las revisiones sistemáticas y los metaanálisis se han consolidado como una herramienta fundamental para la práctica clínica basada en la evidencia. Inicialmente, el metaanálisis fue propuesto como una técnica que podría mejorar la precisión y la potencia estadística de la investigación procedente de estudios individuales con pequeño tamaño muestral. Sin embargo, uno de sus principales inconvenientes es que suelen comparar no más de 2 intervenciones alternativas a la vez. Los «metaanálisis en red» utilizan técnicas novedosas de análisis que permiten incorporar la información procedente de comparaciones directas e indirectas a partir de una red de estudios que examina los efectos de diversos tratamientos de una manera más completa. Pese a sus potenciales limitaciones, su aplicación en epidemiología clínica podría ser potencialmente útil en situaciones en las que existen varios tratamientos que se han comparado frente a un comparador común. Además, estas técnicas pueden ser relevantes ante una pregunta clínica o de investigación cuando existen múltiples tratamientos que deben ser considerados, o cuando se dispone tanto de información directa como indirecta en el cuerpo de la evidencia.


Systematic reviews and meta-analyses have been established as fundamental tools for evidence-based clinical practice. Initially, meta-analysis was proposed as a technique that could improve the precision and statistical power of individual studies research, with small sample sizes. However, one of its main drawbacks was related to usually comparing only 2 alternatives at a time. "Network metaanalyses" uses novel analytical techniques that allow information from direct and indirect comparisons to be incorporated from a network of studies that examine the effects of various treatments in a more comprehensive way. Despite potential limitations, its application in clinical epidemiology would most likely be useful in situations where there are several treatments that need to be compared against a common comparator. In addition, these techniques may be relevant to answer research questions that involve multiple treatments, or when both direct and indirect information are available in the body of evidence.

17.
J Appl Stat ; 49(12): 3141-3163, 2022.
Article in English | MEDLINE | ID: mdl-36035608

ABSTRACT

The homogeneity tests of odds ratios are used in clinical trials and epidemiological investigations as a preliminary step of meta-analysis. In recent studies, the severity or mortality of COVID-19 in relation to demographic characteristics, comorbidities, and other conditions has been popularly discussed by interpreting odds ratios and using meta-analysis. According to the homogeneity test results, a common odds ratio summarizes all of the odds ratios in a series of studies. If the aim is not to find a common odds ratio, but to find which of the sub-characteristics/groups is different from the others or is under risk, then the implementation of a multiple comparison procedure is required. In this article, the focus is placed on the accuracy and reliability of the homogeneity of odds ratio tests for multiple comparisons when the odds ratios are heterogeneous at the omnibus level. Three recently proposed multiple comparison tests and four homogeneity of odds ratios tests with six adjustment methods to control the type-I error rate are considered. The reliability and accuracy of the methods are discussed in relation to COVID-19 severity data associated with diabetes on a country-by-country basis, and a simulation study to assess the powers and type-I error rates of the tests is conducted.

18.
Pract Lab Med ; 31: e00298, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35880118

ABSTRACT

Objectives: Butyrylcholinesterase (BChE) is an important biomarker in serum, and aberrant BChE activity indicates onset and progression of human diseases. The duration of serum storage at -80 °C may introduce variability into and compromise the reproducibility of BChE activity measurements. Design and Methods: We collected serum samples from eight healthy volunteers and determined serum BChE activity in these samples using a sensitive fluorescence assay at various time points during a six-month storage period at -80 °C. Changes in averaged BChE activity over storage time were assessed by repeated measures analysis of variance (ANOVA). Sidak multiple comparisons test was also used to perform post-hoc analysis. Results: Almost all determined BChE activity values lay within the normal physiological range of BChE activity. However, repeated measures ANOVA using mean BChE activity vs. storage time showed that BChE activity values from two time points were significantly different. Analysis by Sidak multiple comparisons test provided no substantial change of BChE activity during the first 90 days of storage, but BChE activity noticeably decreased after 90 days. Conclusions: Serum samples stored in -80 °C for up to 90 days can be exploited to accurately determine BChE activity.

19.
Stat Med ; 41(9): 1688-1708, 2022 04 30.
Article in English | MEDLINE | ID: mdl-35124836

ABSTRACT

Sequential, multiple assignment, randomized trials (SMARTs) compare sequences of treatment decision rules called dynamic treatment regimes (DTRs). In particular, the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE) SMART aimed to determine the best DTRs for patients with a substance use disorder. While many authors have focused on a single pairwise comparison, addressing the main goal involves comparisons of >2 DTRs. For complex comparisons, there is a paucity of methods for binary outcomes. We fill this gap by extending the multiple comparisons with the best (MCB) methodology to the Bayesian binary outcome setting. The set of best is constructed based on simultaneous credible intervals. A substantial challenge for power analysis is the correlation between outcome estimators for distinct DTRs embedded in SMARTs due to overlapping subjects. We address this using Robins' G-computation formula to take a weighted average of parameter draws obtained via simulation from the parameter posteriors. We use non-informative priors and work with the exact distribution of parameters avoiding unnecessary normality assumptions and specification of the correlation matrix of DTR outcome summary statistics. We conduct simulation studies for both the construction of a set of optimal DTRs using the Bayesian MCB procedure and the sample size calculation for two common SMART designs. We illustrate our method on the ENGAGE SMART. The R package SMARTbayesR for power calculations is freely available on the Comprehensive R Archive Network (CRAN) repository. An RShiny app is available at https://wilart.shinyapps.io/shinysmartbayesr/.


Subject(s)
Research Design , Bayes Theorem , Computer Simulation , Humans , Sample Size
20.
Biometrics ; 78(1): 238-247, 2022 03.
Article in English | MEDLINE | ID: mdl-33354761

ABSTRACT

When a ranking of institutions such as medical centers or universities is based on a numerical measure of performance provided with a standard error, confidence intervals (CIs) should be calculated to assess the uncertainty of these ranks. We present a novel method based on Tukey's honest significant difference test to construct simultaneous CIs for the true ranks. When all the true performances are equal, the probability of coverage of our method attains the nominal level. In case the true performance measures have no exact ties, our method is conservative. For this situation, we propose a rescaling method to the nominal level that results in shorter CIs while keeping control of the simultaneous coverage. We also show that a similar rescaling can be applied to correct a recently proposed Monte-Carlo based method, which is anticonservative. After rescaling, the two methods perform very similarly. However, the rescaling of the Monte-Carlo based method is computationally much more demanding and becomes infeasible when the number of institutions is larger than 30-50. We discuss another recently proposed method similar to ours based on simultaneous CIs for the true performance. We show that our method provides uniformly shorter CIs for the same confidence level. We illustrate the superiority of our new methods with a data analysis for travel time to work in the United States and on rankings of 64 hospitals in the Netherlands.


Subject(s)
Hospitals , Research Design , Confidence Intervals , Monte Carlo Method , Probability , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...