Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 255
Filter
1.
Biometrics ; 80(3)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38994641

ABSTRACT

This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.


Subject(s)
Area Under Curve , Computer Simulation , ROC Curve , Humans , Reference Standards , Statistics, Nonparametric , Biomarkers/analysis , Models, Statistical
2.
World J Methodol ; 14(2): 93026, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38983662

ABSTRACT

The simulated patient methodology (SPM) is considered the "gold standard" as covert participatory observation. SPM is attracting increasing interest for the investigation of community pharmacy practice; however, there is criticism that SPM can only show a small picture of everyday pharmacy practice and therefore has limited external validity. On the one hand, a certain design and application of the SPM goes hand in hand with an increase in external validity. Even if, on the other hand, this occurs at the expense of internal validity due to the trade-off situation, the justified criticism of the SPM for investigating community pharmacy practice can be countered.

3.
Int J Neurosci ; : 1-7, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-38963350

ABSTRACT

OBJECTIVE: To analyze the diagnostic value of HR-VWI in intracranial arterial stenosis and occlusion and compare it with DSA. METHODS: A retrospective analysis of clinical data of 59 patients with intracranial arterial stenosis in our hospital was conducted to compare the diagnostic results of the two methods for different degrees of intracranial stenosis and various morphological plaques. RESULTS: The diagnosis of stenosis and occlusion by both methods showed no significant difference (p > 0.05). Comparison of plaque morphology detected by HR-VWI with pathological examination results showed no significant difference (p > 0.05); however, there was a significant difference between plaque morphology detected by DSA and pathological examination results (p < 0.05). Additionally, there was a significant difference between plaque morphology detected by HR-VWI and DSA (p < 0.05). CONCLUSION: HR-VWI technique is comparable to DSA technique in diagnosing intracranial arterial stenosis and occlusion, but it is superior to DSA in plaque morphology diagnosis.

4.
J Obstet Gynaecol India ; 74(3): 191-195, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38974747

ABSTRACT

Postpartum hemorrhage (PPH) remains a significant contributor to maternal morbidity and mortality worldwide. In India, PPH affects approximately 12% of women. The prevention and management of PPH are the significant challenges in obstetrics, with accurate assessment of blood loss and timely intervention being critical. Active Management of the Third Stage of Labor is a gold standard strategy for prevention. Recent advancements in PPH management include the use of recombinant activated factor VIIa, which has shown promise in decreasing the need for invasive procedures and second-line therapies. Additionally, surgical and radiological interventions have been effective in cases of refractory PPH. Overall, ongoing research and advancements in PPH management continue to enhance the quality of care and maternal outcomes experiencing this potentially life-threatening complication of childbirth. This editorial explores prevention and management of Atonic PPH with encompassing medical and surgical strategies, to enhance understanding and optimize clinical care for mothers at risk of this obstetric emergency.

5.
Z Psychosom Med Psychother ; 70(2): 106-111, 2024 Jun.
Article in German | MEDLINE | ID: mdl-39012191

ABSTRACT

Recently Papola et al. (2023) published a network meta-analysis (NMA) on psychotherapy of generalized anxiety disorder (GAD) and concluded that cognitive-behavioral therapy (CBT) should be considered the first-line treatment for GAD. However, there are several concerns with regard to the procedures and the conclusions of this NMA and of NMA in general. We show that these concerns question the conclusions by Papola et al. Furthermore, we place concerns about thisNMAin a broader context and question whether existing evidence is consistent with the notion that one form of psychotherapy can be regarded as the gold standard for mental disorders and for all patients and therapists.


Subject(s)
Anxiety Disorders , Cognitive Behavioral Therapy , Humans , Anxiety Disorders/therapy , Anxiety Disorders/psychology , Anxiety Disorders/diagnosis , Psychotherapy , Meta-Analysis as Topic
6.
medRxiv ; 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38699296

ABSTRACT

Accurate assessments of symptoms and diagnoses are essential for health research and clinical practice but face many challenges. The absence of a single error-free measure is currently addressed by assessment methods involving experts reviewing several sources of information to achieve a more accurate or best-estimate assessment. Three bodies of work spanning medicine, psychiatry, and psychology propose similar assessment methods: The Expert Panel, the Best-Estimate Diagnosis, and the Longitudinal Expert All Data (LEAD). However, the quality of such best-estimate assessments is typically very difficult to evaluate due to poor reporting of the assessment methods and when it is reported, the reporting quality varies substantially. Here we tackle this gap by developing reporting guidelines for such studies, using a four-stage approach: 1) drafting reporting standards accompanied by rationales and empirical evidence, which were further developed with a patient organization for depression, 2) incorporating expert feedback through a two-round Delphi procedure, 3) refining the guideline based on an expert consensus meeting, and 4) testing the guideline by i) having two researchers test it and ii) using it to examine the extent previously published articles report the standards. The last step also demonstrates the need for the guideline: 18 to 58% (Mean = 33%) of the standards were not reported across fifteen randomly selected studies. The LEADING guideline comprises 20 reporting standards related to four groups: The Longitudinal design; the Appropriate data; the Evaluation - experts, materials, and procedures; and the Validity group. We hope that the LEADING guideline will be useful in assisting researchers in planning, reporting, and evaluating research aiming to achieve best-estimate assessments.

7.
Article in English | MEDLINE | ID: mdl-38531639

ABSTRACT

BACKGROUND: No data exist at the population level on what tests are used to aid in the diagnosis of autism spectrum disorder in community practice. OBJECTIVES: To describe autism spectrum disorder testing practices to inform autism spectrum disorder identification efforts. METHODS: Data are from the Autism and Developmental Disabilities Monitoring Network, a multi-site surveillance system reporting prevalence estimates and characteristics of 8-year-old children with autism spectrum disorder. Percentages of children with autism spectrum disorder who received any autism spectrum disorder test or a 'gold standard' test were calculated by site, sex, race, median household income, and intellectual ability status. Risk ratios were calculated to compare group differences. RESULTS: Of 5058 8-year-old children with autism spectrum disorder across 11 sites, 3236 (64.0%) had a record of any autism spectrum disorder test and 2136 (42.2%) had a 'gold standard' ADOS or ADI-R test. Overall, 115 children (2.3%) had both the ADOS and ADI-R in their records. Differences persisted across race, median household income, and intellectual ability status. Asian/Pacific Islander children had the highest percent receiving any ASD test (71.8%; other groups range: 57.4-66.0%) and White children had the highest percent receiving 'gold standard' tests (46.4%; other groups range: 35.6-43.2%). Children in low-income neighbourhoods had a lower percent of any test (62.5%) and 'gold standard' tests (39.4%) compared to medium (70.2% and 47.5%, respectively) and high (69.6% and 46.8%, respectively) income neighbourhoods. Children with intellectual disability had a lower percent of any ASD test (81.7%) and 'gold standard' tests (52.6%) compared to children without intellectual disability (84.0% and 57.6%, respectively). CONCLUSIONS: Autism spectrum disorder testing practices vary widely by site and differ by race and presence of co-occurring intellectual disability, suggesting opportunities to standardise and/or improve autism spectrum disorder identification practices.

8.
Am J Bot ; 111(3): e16300, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38469876

ABSTRACT

PREMISE: Many plastomes of autotrophic Piperales have been reported to date, describing a variety of differences. Most studies focused only on a few species or a single genus, and extensive, comparative analyses have not been done. Here, we reviewed publicly available plastome reconstructions for autotrophic Piperales, reanalyzed publicly available raw data, and provided new sequence data for all previously missing genera. Comparative plastome genomics of >100 autotrophic Piperales were performed. METHODS: We performed de novo assemblies to reconstruct the plastomes of newly generated sequence data. We used Sanger sequencing and read mapping to verify the assemblies and to bridge assembly gaps. Furthermore, we reconstructed the phylogenetic relationships as a foundation for comparative plastome genomics. RESULTS: We identified a plethora of assembly and annotation issues in published plastome data, which, if unattended, will lead to an artificial increase of diversity. We were able to detect patterns of missing and incorrect feature annotation and determined that the inverted repeat (IR) boundaries were the major source for erroneous assembly. Accounting for the aforementioned issues, we discovered relatively stable junctions of the IRs and the small single-copy region (SSC), whereas the majority of plastome variations among Piperales stems from fluctuations of the boundaries of the IR and the large single-copy (LSC) region. CONCLUSIONS: This study of all available plastomes of autotrophic Piperales, expanded by new data for previously missing genera, highlights the IR-LSC junctions as a potential marker for discrimination of various taxonomic levels. Our data indicates a pseudogene-like status for cemA and ycf15 in various Piperales. Based on a review of published data, we conclude that incorrect IR-SSC boundary identification is the major source for erroneous plastome assembly. We propose a gold standard for assembly and annotation of high-quality plastomes based on de novo assembly methods and appropriate references for gene annotation.


Subject(s)
Magnoliopsida , Phylogeny , Magnoliopsida/genetics , Genomics
9.
J Imaging Inform Med ; 37(2): 489-503, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38316666

ABSTRACT

Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variability among reviewing and primary reading radiologists, hindering scalability and effectiveness. Reducing this variability is key to enhancing the reliability of results and instilling confidence in the review process. In this paper, we propose a novel statistical approach called "Bayesian Inter-Reviewer Agreement Rate" (BIRAR) that integrates radiologist variability. By doing so, BIRAR aims to enhance the accuracy and consistency of peer review assessments, providing physicians involved in quality improvement and peer learning programs with valuable and reliable insights. A computer simulation was designed to assign predefined interpretive error rates to hypothetical interpreting and peer-reviewing radiologists. The Monte Carlo simulation then sampled (100 samples per experiment) the data that would be generated by peer reviews. The performances of BIRAR and four other peer review methods for measuring interpretive error rates were then evaluated, including a method that uses a gold standard diagnosis. Application of the BIRAR method resulted in 93% and 79% higher relative accuracy and 43% and 66% lower relative variability, compared to "Single/Standard" and "Majority Panel" peer review methods, respectively. Accuracy was defined by the median difference of Monte Carlo simulations between measured and pre-defined "actual" interpretive error rates. Variability was defined by the 95% CI around the median difference of Monte Carlo simulations between measured and pre-defined "actual" interpretive error rates. BIRAR is a practical and scalable peer review method that produces more accurate and less variable assessments of interpretive quality by accounting for variability within the group's radiologists, implicitly applying a standard derived from the level of consensus within the group across various types of interpretive findings.

10.
J Chemother ; : 1-9, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38409748

ABSTRACT

Meticulous antimicrobial management is essential among critically ill patients with acute kidney injury, particularly if renal replacement therapy is needed. Many factors affect drug removal in patients undergoing continuous renal replacement therapy CRRT. In this study, we aimed to compare current databases that are frequently used to adjust CRRT dosages of antimicrobial drugs with the gold standard. The dosage recommendations from various databases for antimicrobial drugs eliminated by CRRT were investigated. The book 'Renal Pharmacotherapy: Dosage Adjustment of Medications Eliminated by the Kidneys' was chosen as the gold standard. There were variations in the databases. Micromedex, UpToDate, and Sanford had similar rates to the gold standard of 45%, 35%, and 30%, respectively. The Micromedex database shows the most similar results to the gold standard source. In addition, a consensus was reached as a result of the expert panel meetings established to discuss the different antimicrobial dose recommendations of the databases.

11.
BMC Infect Dis ; 24(1): 163, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38321395

ABSTRACT

BACKGROUND: Diagnosis of tuberculous meningitis (TBM) is hampered by the lack of a gold standard. Current microbiological tests lack sensitivity and clinical diagnostic approaches are subjective. We therefore built a diagnostic model that can be used before microbiological test results are known. METHODS: We included 659 individuals aged [Formula: see text] years with suspected brain infections from a prospective observational study conducted in Vietnam. We fitted a logistic regression diagnostic model for TBM status, with unknown values estimated via a latent class model on three mycobacterial tests: Ziehl-Neelsen smear, Mycobacterial culture, and GeneXpert. We additionally re-evaluated mycobacterial test performance, estimated individual mycobacillary burden, and quantified the reduction in TBM risk after confirmatory tests were negative. We also fitted a simplified model and developed a scoring table for early screening. All models were compared and validated internally. RESULTS: Participants with HIV, miliary TB, long symptom duration, and high cerebrospinal fluid (CSF) lymphocyte count were more likely to have TBM. HIV and higher CSF protein were associated with higher mycobacillary burden. In the simplified model, HIV infection, clinical symptoms with long duration, and clinical or radiological evidence of extra-neural TB were associated with TBM At the cutpoints based on Youden's Index, the sensitivity and specificity in diagnosing TBM for our full and simplified models were 86.0% and 79.0%, and 88.0% and 75.0% respectively. CONCLUSION: Our diagnostic model shows reliable performance and can be developed as a decision assistant for clinicians to detect patients at high risk of TBM. Diagnosis of tuberculous meningitis is hampered by the lack of gold standard. We developed a diagnostic model using latent class analysis, combining confirmatory test results and risk factors. Models were accurate, well-calibrated, and can support both clinical practice and research.


Subject(s)
HIV Infections , Mycobacterium tuberculosis , Tuberculosis, Meningeal , Humans , Aged , Tuberculosis, Meningeal/diagnosis , Latent Class Analysis , Bayes Theorem , Sensitivity and Specificity , Seizures
12.
Paediatr Anaesth ; 34(4): 318-323, 2024 04.
Article in English | MEDLINE | ID: mdl-38055618

ABSTRACT

BACKGROUND/AIMS: Traditional manual methods of extracting anesthetic and physiological data from the electronic health record rely upon visual transcription by a human analyst that can be labor-intensive and prone to error. Technical complexity, relative inexperience in computer coding, and decreased access to data warehouses can deter investigators from obtaining valuable electronic health record data for research studies, especially in under-resourced settings. We therefore aimed to develop, pilot, and demonstrate the effectiveness and utility of a pragmatic data extraction methodology. METHODS: Expired sevoflurane concentration data from the electronic health record transcribed by eye was compared to an intermediate preprocessing method in which the entire anesthetic flowsheet narrative report was selected, copy-pasted, and processed using only Microsoft Word and Excel software to generate a comma-delimited (.csv) file. A step-by-step presentation of this method is presented. Concordance rates, Pearson correlation coefficients, and scatterplots with lines of best fit were used to compare the two methods of data extraction. RESULTS: A total of 1132 datapoints across eight subjects were analyzed, accounting for 18.9 h of anesthesia time. There was a high concordance rate of data extracted using the two methods (median concordance rate 100% range [96%, 100%]). The median time required to complete manual data extraction was significantly longer compared to the time required using the intermediate method (240 IQR [199, 482.5] seconds vs 92.5 IQR [69, 99] seconds, p = .01) and was linearly associated with the number of datapoints (rmanual = .97, p < .0001), whereas time required to complete data extraction using the intermediate approach was independent of the number of datapoints (rintermediate = -.02, p = .99). CONCLUSIONS: We describe a pragmatic data extraction methodology that does not require additional software or coding skills intended to enhance the ease, speed, and accuracy of data collection that could assist in clinician investigator-initiated research and quality/process improvement projects.


Subject(s)
Anesthetics , Electronic Health Records , Humans , Anesthetics/pharmacology
13.
Am J Epidemiol ; 193(3): 548-560, 2024 Feb 05.
Article in English | MEDLINE | ID: mdl-37939113

ABSTRACT

In a recent systematic review, Bastos et al. (Ann Intern Med. 2021;174(4):501-510) compared the sensitivities of saliva sampling and nasopharyngeal swabs in the detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection by assuming a composite reference standard defined as positive if either test is positive and negative if both tests are negative (double negative). Even under a perfect specificity assumption, this approach ignores the double-negative results and risks overestimating the sensitivities due to residual misclassification. In this article, we first illustrate the impact of double-negative results in the estimation of the sensitivities in a single study, and then propose a 2-step latent class meta-analysis method for reevaluating both sensitivities using the same published data set as that used in Bastos et al. by properly including the observed double-negative results. We also conduct extensive simulation studies to compare the performance of the proposed method with Bastos et al.'s method for varied levels of prevalence and between-study heterogeneity. The results demonstrate that the sensitivities are overestimated noticeably using Bastos et al.'s method, and the proposed method provides a more accurate evaluation with nearly no bias and close-to-nominal coverage probability. In conclusion, double-negative results can significantly impact the estimated sensitivities when a gold standard is absent, and thus they should be properly incorporated.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , SARS-CoV-2 , Negative Results , Saliva , Nasopharynx
14.
Pathogens ; 12(11)2023 Oct 26.
Article in English | MEDLINE | ID: mdl-38003748

ABSTRACT

The American Association of Equine Practitioners strongly advocates evidence-based intestinal strongyle control in horses. It recommends targeted treatment of all heavy egg shedders (>500 eggs per gram (EPG) of feces), while the low shedders (0-200 EPG) are left untreated. As 50-75% of adult horses in a herd are low shedders, preventing them from unnecessary anthelmintic exposure is critical for tackling resistance. There are various fecal egg count (FEC) techniques with many modifications and variations in use, but none is identified as a gold standard. The hypothesis of the study was that the diagnostic performance of 12 commonly used quantitation methodologies (three techniques with four variants) differs. In this regard, method comparison studies were performed using polystyrene beads as proxy for intestinal strongyle eggs. Mini-FLOTAC-based variants had the lowest coefficient of variation (CV%) in bead recovery, whereas McMaster variants had the highest. All four variants of Mini-FLOTAC and the NaNO3 1.33 specific gravity variant of modified Wisconsin followed a linear fit with R2 > 0.95. In contrast, the bead standard replicates for modified McMaster variants dispersed from the regression curve, causing a lower R2. The Mini-FLOTAC method seems less influenced by the choice of floatation solution and has better repeatability parameters and linearity for bead standard recovery. For FEC tests with high R2 (>0.95) but that underestimated the true bead count, a correction factor (CF) was determined to estimate the true count. Finally, the validity of CF was analyzed for 5 tests with R2 > 0.95 to accurately quantify intestinal strongyle eggs from 40 different horses. Overall, this study identified FEC methodologies with the highest diagnostic performance. The limitations in standardizing routine FEC tests are highlighted, and the importance of equalization of FEC results is emphasized for promoting uniformity in the implementation of parasite control guidelines.

15.
Kidney Int Rep ; 8(11): 2345-2355, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38025210

ABSTRACT

Introduction: In clinical practice, kidney (dys)function is monitored through creatinine-based estimations of glomerular filtration rate (eGFR: Modification of Diet in Renal Disease [MDRD], Chronic Kidney Disease Epidemiology Collaboration [CKD-EPI]). Creatinine is recognized as a late and insensitive biomarker of glomerular filtration rate (GFR). The novel biomarker proenkephalin (PENK) may overcome these limitations, but no PENK-based equation for eGFR is currently available. Therefore, we developed and validated a PENK-based equation to assess GFR. Methods: In this international multicenter study in 1354 stable and critically ill patients, GFR was measured (mGFR) through iohexol or iothalamate clearance. A generalized linear model with sigmoidal nonlinear transfer function was used for equation development in the block-randomized development set. Covariates were selected in a data-driven fashion. The novel equation was assessed for bias, precision (mean ± SD), and accuracy (eGFR percentage within ±30% of mGFR, P30) in the validation set and compared with MDRD and CKD-EPI. Results: Median mGFR was 61 [44-81] ml/min per 1.73 m2. In order of importance, PENK, creatinine, and age were included, and sex or race did not improve performance. The PENK-based equation mean ± SD bias of the mGFR was 0.5 ± 15 ml/min per 1.73 m2, significantly less compared with MDRD (8 ± 17, P < 0.001) and 2009 CKD-EPI (5 ± 17, P < 0.001), not reaching statistical significance compared with 2021 CKD-EPI (1.3 ± 16, P = 0.06). The P30 accuracy of the PENK-based equation was 83%, significantly higher compared with MDRD (68%, P < 0.001) and 2009 CKD-EPI (76%, P < 0.001), similar to 2021 CKD-EPI (80%, P = 0.13). Conclusion: Overall, the PENK-based equation to assess eGFR performed better than most creatinine-based equations without using sex or race.

17.
Diagnostics (Basel) ; 13(18)2023 Sep 09.
Article in English | MEDLINE | ID: mdl-37761259

ABSTRACT

BACKGROUND: Currently, assessing the diagnostic performance of new laboratory tests assumes a perfect reference standard, which is rarely the case. Wrong classifications of the true disease status will inevitably lead to biased estimates of sensitivity and specificity. OBJECTIVES: Using Bayesian' latent class models (BLCMs), an approach that does not assume a perfect reference standard, we re-analyzed data of a large prospective observational study assessing the diagnostic accuracy of an antigen test for the diagnosis of SARS-CoV-2 infection in clinical practice. METHODS: A cohort of consecutive patients presenting to a COVID-19 testing facility affiliated with a Swiss University Hospital were recruited (n = 1465). Two real-time PCR tests were conducted in parallel with the Roche/SD Biosensor rapid antigen test on nasopharyngeal swabs. A two-test (PCR and antigen test), three-population BLCM was fitted to the frequencies of paired test results. RESULTS: Based on the BLCM, the sensitivities of the RT-PCR and the Roche/SD Biosensor rapid antigen test were 98.5% [95% CRI 94.8;100] and 82.7% [95% CRI 66.8;100]. The specificities were 97.7% [96.1;99.7] and 99.9% [95% CRI 99.6;100]. CONCLUSIONS: Applying the BLCM, the diagnostic accuracy of RT-PCR was high but not perfect. In contrast to previous results, the sensitivity of the antigen test was higher. Our results suggest that BLCMs are valuable tools for investigating the diagnostic performance of laboratory tests in the absence of perfect reference standard.

18.
Health Psychol Behav Med ; 11(1): 2244576, 2023.
Article in English | MEDLINE | ID: mdl-37663014

ABSTRACT

Background: Inaccuracy in current diagnostic procedures for mental disorders can lead to misdiagnosis and increase the burden on the healthcare system. Therefore, Klenico, a diagnostic software designed to support comprehensive and efficient clinical diagnostic procedures that is easy to apply in everyday clinical practice, was developed. This study aimed to take the first step toward validating the Klenico self-report module. Methods: Data of 115 patients from a German psychotherapeutic outpatient clinic were included in this study. Criterion validity was tested by comparing Klenico with the diagnoses based on the structured clinical interview for DSM-IV (SCID). Construct validity was investigated by comparing Klenico with commonly used self-reporting questionnaires. Results: The results showed that most of the Klenico disorder domains were able to differentiate between corresponding diagnoses and other diagnoses, confirming criterion validity. Construct validity was demonstrated by high correlations with the compared convergent questionnaire scales and non-significant or low correlations with most of the divergent scales. Conclusions: These preliminary results demonstrate the psychometric properties of the Klenico self-report module and imply that the Klenico system has high potential to improve the accuracy of diagnostic procedures in everyday clinical practice.

19.
J Orofac Orthop ; 2023 Aug 29.
Article in English | MEDLINE | ID: mdl-37642657

ABSTRACT

PURPOSE: The aim of this investigation was to evaluate the accuracy of various skeletal and dental cephalometric parameters as produced by different commercial providers that make use of artificial intelligence (AI)-assisted automated cephalometric analysis and to compare their quality to a gold standard established by orthodontic experts. METHODS: Twelve experienced orthodontic examiners pinpointed 15 radiographic landmarks on a total of 50 cephalometric X­rays. The landmarks were used to generate 9 parameters for orthodontic treatment planning. The "humans' gold standard" was defined by calculating the median value of all 12 human assessments for each parameter, which in turn served as reference values for comparisons with results given by four different commercial providers of automated cephalometric analyses (DentaliQ.ortho [CellmatiQ GmbH, Hamburg, Germany], WebCeph [AssembleCircle Corp, Seongnam-si, Korea], AudaxCeph [Audax d.o.o., Ljubljana, Slovenia], CephX [Orca Dental AI, Herzliya, Israel]). Repeated measures analysis of variances (ANOVAs) were calculated and Bland-Altman plots were generated for comparisons. RESULTS: The results of the repeated measures ANOVAs indicated significant differences between the commercial providers' predictions and the humans' gold standard for all nine investigated parameters. However, the pairwise comparisons also demonstrate that there were major differences among the four commercial providers. While there were no significant mean differences between the values of DentaliQ.ortho and the humans' gold standard, the predictions of AudaxCeph showed significant deviations in seven out of nine parameters. Also, the Bland-Altman plots demonstrate that a reduced precision of AI predictions must be expected especially for values attributed to the inclination of the incisors. CONCLUSION: Fully automated cephalometric analyses are promising in terms of timesaving and avoidance of individual human errors. At present, however, they should only be used under supervision of experienced clinicians.

20.
Stat Methods Med Res ; 32(9): 1784-1798, 2023 09.
Article in English | MEDLINE | ID: mdl-37503578

ABSTRACT

Three-arm 'gold-standard' non-inferiority trials are recommended for indications where only unstable reference treatments are available and the use of a placebo group can be justified ethically. For such trials, several study designs have been suggested that use the placebo group for testing 'assay sensitivity', that is, the ability of the trial to replicate efficacy. Should the reference fail in the given trial, then non-inferiority could also be shown with an ineffective experimental treatment and hence becomes useless. In this article, we extend the so-called Koch-Röhmel design where a proof of efficacy for the experimental treatment is required in order to qualify for the non-inferiority test. While the efficacy of the experimental treatment is an indication of assay sensitivity, it does not guarantee that the reference is sufficiently efficient to let the non-inferiority claim be meaningful. It has, therefore, been suggested to adaptively test the non-inferiority only if the reference demonstrates superiority to placebo and otherwise to test δ-superiority of the experimental treatment over placebo, where δ is chosen in such a way that it provides proof of non-inferiority with regard to the reference's historical effect. In this article, we extend the previous work by complementing its adaptive test with compatible simultaneous confidence intervals. Confidence intervals are commonly used and suggested by regulatory guidelines for non-inferiority trials. We show how to adopt different approaches to simultaneous confidence intervals from the literature to the setting of three-arm non-inferiority trials and compare these methods in a simulation study. Finally, we apply these methods to a real clinical trial example.


Subject(s)
Research Design , Therapies, Investigational , Confidence Intervals , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...