Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 436
Filter
1.
Article in English | MEDLINE | ID: mdl-38896105

ABSTRACT

BACKGROUND: Inter-observer agreement for the American Association of Gynecologic Laparoscopists (AAGL) 2021 Endometriosis Classification staging system has not been described. Its predecessor staging system, the revised American Society for Reproductive Medicine (rASRM), has historically demonstrated poor inter-observer agreement. AIMS: We aimed to determine the inter-observer agreement performance of the AAGL 2021 Endometriosis Classification staging system, and compare this with the rASRM staging system. MATERIALS AND METHODS: A database of 317 patients with coded surgical data was retrospectively analysed. Three independent observers allocated AAGL surgical stages (1-4), twice. Observers made their own interpretation of how to apply the tool in the first staging allocation. Consensus rules were then developed for a second staging allocation. RESULTS: First staging allocation: odds ratio (OR) (and 95% CI) for observer 1 to score higher than observer 2 was 8.08 (5.12-12.76). Observer 1 to score higher than observer 3 was 12.98 (7.99-21.11) and observer 2 to score higher than observer 3 was 1.61 (1.03-2.51). This represents poor agreement. Second staging allocation (after consensus): OR for observer 1 to score higher than observer 2 was 1.14 (0.64-2.03), observer 1 to score higher than observer 3 was 1.81 (0.99-3.28) and observer 2 to score higher than observer 3 was 1.59 (0.87-2.89). This represents good agreement. CONCLUSIONS: These findings suggest that in its current format the AAGL 2021 Endometriosis Classification staging system has poor inter-observer agreement, not superior to the rASRM staging system. However, performance improved when additional measures were taken to simplify and clarify areas of ambiguity in interpreting the staging system.

2.
Abdom Radiol (NY) ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38896248

ABSTRACT

OBJECTIVES: Magnetic resonance (MR) imaging with secretin stimulation (MR-PFTs) is a non-invasive test for pancreatic exocrine function based on assessing the volume of secreted bowel fluid in vivo. Adoption of this methodology in clinical care and research is largely limited to qualitative assessment of secretion as current methods for secretory response quantification require manual thresholding and segmentation of MR images, which can be time-consuming and prone to interrater variability. We describe novel software (PFTquant) that preprocesses and thresholds MR images, performs heuristic detection of non-bowel fluid objects, and provides the user with intuitive semi-automated tools to segment and quantify bowel fluid in a fast and robust manner. We evaluate the performance of this software on a retrospective set of clinical MRIs. METHODS: Twenty MRIs performed in children (< 18 years) were processed independently by two observers using a manual technique and using PFTquant. Interrater agreement in measured secreted fluid volume was compared using intraclass correlation coefficients, Bland-Altman difference analysis, and Dice similarity coefficients. RESULTS: Interrater reliability of measured bowel fluid secretion using PFTquant was 0.90 (0.76-0.96 95% C.I.) with - 4.5 mL mean difference (-39.4-30.4 mL 95% limits of agreement) compared to 0.69 (0.36-0.86 95% C.I.) with - 0.9 mL mean difference (-77.3-75.5 mL 95% limits of agreement) for manual processing. Dice similarity coefficients were better using PFTquant (0.88 +/- 0.06) compared to manual processing (0.85 +/- 0.10) but not significantly (p = 0.11). Time to process was significantly (p < 0.001) faster using PFTquant (412 +/- 177 s) compared to manual processing (645 +/- 305 s). CONCLUSION: Novel software provides fast, reliable quantification of secreted fluid volume in children undergoing MR-PFTs. Use of the novel software could facilitate wider adoption of quantitative MR-PFTs in clinical care and research.

3.
Brachytherapy ; 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38845268

ABSTRACT

PURPOSE: To investigate geometric and dosimetric inter-observer variability in needle reconstruction for temporary prostate brachytherapy. To assess the potential of registrations between transrectal ultrasound (TRUS) and cone-beam computed tomography (CBCT) to support implant reconstructions. METHODS AND MATERIALS: The needles implanted in 28 patients were reconstructed on TRUS by three physicists. Corresponding geometric deviations and associated dosimetric variations to prostate and organs at risk (urethra, bladder, rectum) were analyzed. To account for the found inter-observer variability, various approaches (template-based, probe-based, marker-based) for registrations of CBCT to TRUS were investigated regarding the respective needle transfer accuracy in a phantom study. Three patient cases were examined to assess registration accuracy in-vivo. RESULTS: Geometric inter-observer deviations >1 mm and >3 mm were found for 34.9% and 3.5% of all needles, respectively. Prostate dose coverage (changes up to 7.2%) and urethra dose (partly exceeding given dose constraints) were most affected by associated dosimetric changes. Marker-based and probe-based registrations resulted in the phantom study in high mean needle transfer accuracies of 0.73 mm and 0.12 mm, respectively. In the patient cases, the marker-based approach was the superior technique for CBCT-TRUS fusions. CONCLUSION: Inter-observer variability in needle reconstruction can substantially affect dosimetry for individual patients. Especially marker-based CBCT-TRUS registrations can help to ensure accurate reconstructions for improved treatment planning.

4.
Article in English, Spanish | MEDLINE | ID: mdl-38878884

ABSTRACT

Vertebral compression fractures by osteoporosis (OVF) is usually a diagnostic problem and coincides on the age group of metastatic vertebral compression fractures (MVF). Although radiography is the first diagnostic technique, generally is not accurate for depicting demineralization and soft tissue lesions. Magnetic resonance (MRI) is the diagnostic choice. The most relevant signs are Intravertebral fluid collection or fluid signal, other vertebral deformities without edema and older age. Among the most relevant findings for diagnosis MVF are soft tissue mass and pedicle intensity signal asymmetries. However, reproducibility of these findings in clinical practice is moderate.

5.
Biochem Med (Zagreb) ; 34(2): 020803, 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-38882588

ABSTRACT

Introduction: Due to high inter-observer variability the 2015 International Council for Standardization in Haematology (ICSH) recommendations state to count band neutrophils as segmented neutrophils in the white blood cell (WBC) differential. However, the inclusion of bands as a separate cell entity within the WBC differential is still widely used in hematology laboratories in Croatia. The aim of this multicentric study was to assess the degree of inter-observer variability in enumerating band neutrophils within the WBC differential among Croatian laboratories. Materials and methods: Seven large Croatian hospital laboratories from different parts of the country participated in the study. In each of 7 participating laboratories, one blood smear, that was flagged by the analyzer as possibly having bands, was evaluated by all personnel participating in the analysis of hematology samples. Between-observer manual smear reproducibility was expressed as coefficient of variation (CV) and calculated using the following formula: CV (%) = (standard deviation (SD)/mean value) x 100%. Results: The CVs (%) and relative band neutrophil counts in participating laboratories were as follows: 15.4% (16-24), 19.2% (16-32), 19.5% (17-40), 21.1% (17-44), 35.0% (8-26), 51.9% (3-29), and remarkably high 62.4% (12-59). For segmented neutrophils CVs were lower, ranging from 7.4% to 32.2%. The CVs did not correlate with the number of staff members in each hospital (P = 0.293). Conclusions: This study revealed very high variability in enumerating band neutrophil count in the blood smear differential among all participants, thus prompting a need for action on a national level.


Subject(s)
Neutrophils , Humans , Croatia , Pilot Projects , Leukocyte Count , Neutrophils/cytology , Observer Variation , Reproducibility of Results
6.
Neuropsychol Rehabil ; : 1-32, 2024 May 28.
Article in English | MEDLINE | ID: mdl-38805592

ABSTRACT

Goal Attainment Scaling (GAS) is a method for writing person-centred approach evaluation scales that can be used as an outcome measure in clinical or research settings in rehabilitation. To be used in a research setting, it requires a high methodological quality approach. The aim of this study was to explore the feasibility and reliability of the GAS quality rating system, to ensure that GAS scales used as outcome measures are valid and reliable. Secondary objectives were: (1) to compare goal attainment scores' reliability according to how many GAS levels are described in the scale; and (2) to explore if GAS scorings are influenced by who scores goal attainment. The GAS scales analysed here were set collaboratively by 57 cognitively impaired adults clients and their occupational therapist. Goals had to be achieved within an inpatient one-month stay, during which clients participated in an intervention aimed at improving planning skills in daily life. The GAS quality rating system proved to be feasible and reliable. Regarding GAS scores, interrater reliability was higher when only three of the five GAS levels were described, i.e., "three milestone GAS" (0.74-0.92), than when all five levels were described (0.5-0.88), especially when scored by the clients (0.5 -0.88).

7.
Eur Radiol Exp ; 8(1): 55, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38705940

ABSTRACT

BACKGROUND: To evaluate the reproducibility of a vessel-specific minimum cost path (MCP) technique used for lobar segmentation on noncontrast computed tomography (CT). METHODS: Sixteen Yorkshire swine (49.9 ± 4.7 kg, mean ± standard deviation) underwent a total of 46 noncontrast helical CT scans from November 2020 to May 2022 using a 320-slice scanner. A semiautomatic algorithm was employed by three readers to segment the lung tissue and pulmonary arterial tree. The centerline of the arterial tree was extracted and partitioned into six subtrees for lobar assignment. The MCP technique was implemented to assign lobar territories by assigning lung tissue voxels to the nearest arterial tree segment. MCP-derived lobar mass and volume were then compared between two acquisitions, using linear regression, root mean square error (RMSE), and paired sample t-tests. An interobserver and intraobserver analysis of the lobar measurements was also performed. RESULTS: The average whole lung mass and volume was 663.7 ± 103.7 g and 1,444.22 ± 309.1 mL, respectively. The lobar mass measurements from the initial (MLobe1) and subsequent (MLobe2) acquisitions were correlated by MLobe1 = 0.99 MLobe2 + 1.76 (r = 0.99, p = 0.120, RMSE = 7.99 g). The lobar volume measurements from the initial (VLobe1) and subsequent (VLobe2) acquisitions were correlated by VLobe1 = 0.98VLobe2 + 2.66 (r = 0.99, p = 0.160, RSME = 15.26 mL). CONCLUSIONS: The lobar mass and volume measurements showed excellent reproducibility through a vessel-specific assignment technique. This technique may serve for automated lung lobar segmentation, facilitating clinical regional pulmonary analysis. RELEVANCE STATEMENT: Assessment of lobar mass or volume in the lung lobes using noncontrast CT may allow for efficient region-specific treatment strategies for diseases such as pulmonary embolism and chronic thromboembolic pulmonary hypertension. KEY POINTS: • Lobar segmentation is essential for precise disease assessment and treatment planning. • Current methods for segmentation using fissure lines are problematic. • The minimum-cost-path technique here is proposed and a swine model showed excellent reproducibility for lobar mass measurements. • Interobserver agreement was excellent, with intraclass correlation coefficients greater than 0.90.


Subject(s)
Lung , Animals , Swine , Lung/diagnostic imaging , Reproducibility of Results , Tomography, X-Ray Computed/methods , Models, Animal , Algorithms
8.
Histopathology ; 85(1): 171-181, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38571446

ABSTRACT

AIMS: Following the increased use of neoadjuvant therapy for pancreatic cancer, grading of tumour regression (TR) has become part of routine diagnostics. However, it suffers from marked interobserver variation, which is mainly ascribed to the subjectivity of the defining criteria of the categories in TR grading systems. We hypothesized that a further cause for the interobserver variation is the use of divergent and nonspecific morphological criteria to identify tumour regression. METHODS AND RESULTS: Twenty treatment-naïve pancreatic cancers and 20 pancreatic cancers treated with neoadjuvant chemotherapy were reviewed by three experienced pancreatic pathologists who, blinded for treatment status, categorized each tumour as treatment-naïve or neoadjuvantly treated, and annotated all tissue areas they considered showing tumour regression. Only 50%-65% of the cases were categorized correctly, and the annotated tissue areas were highly discrepant (only 3%-41% overlap). When the prevalence of various morphological features deemed to indicate TR was compared between treatment-naïve and neoadjuvantly treated tumours, only one pattern, characterized by reduced cancer cell density and prominent stroma affecting a large area of the tumour bed, occurred significantly more frequently, but not exclusively, in the neoadjuvantly treated group. Finally, stromal features, both morphological and biological, were investigated as possible markers for tumour regression, but failed to distinguish TR from native tumour stroma. CONCLUSION: There is considerable divergence in opinion between pathologists when it comes to the identification of tumour regression. Reliable identification of TR is only possible if it is extensive, while lesser degrees of treatment effect cannot be recognized with certainty.


Subject(s)
Neoadjuvant Therapy , Pancreatic Neoplasms , Humans , Pancreatic Neoplasms/pathology , Pancreatic Neoplasms/diagnosis , Pancreatic Neoplasms/therapy , Male , Female , Aged , Middle Aged , Observer Variation , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Neoplasm Grading
9.
Insights Imaging ; 15(1): 104, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38589691

ABSTRACT

OBJECTIVE: The aim of this study was to evaluate and compare reliability, costs, and radiation dose of dual-energy X-ray absorptiometry (DXA) to MRI and CT in measuring muscle mass for the diagnosis of sarcopenia. METHODS: Thirty-four consecutive DXA scans performed in surgically menopausal women from November 2019 until March 2020 were analyzed by two observers. Observers analyzed muscle mass of the lower limbs in every scan twice. Reliability was assessed by calculating inter- and intra-observer variability. Reliability from CT and MRI as well as radiation dose from CT and DXA were collected from literature. Costs for each type of scan were calculated according to the guidelines for economic evaluation of the Dutch National Health Care Institute. RESULTS: The 34 participants had a median age of 58 years (IQR 53-65) and a median body mass index of 24.6 (IQR 21.7-29.7). Inter-observer variability had an intraclass correlation coefficient (ICC) of 0.997 (95% CI 0.994-0.998) with a relative variability of 0.037 ± 0.022%. Regarding intra-observer variability, observer 1 had an ICC of 0.998 (95% CI 0.996-0.999) with a relative variability of 0.019 ± 0.016% and observer 2 had an ICC of 0.997 (95% CI 0.993-0.998) with a relative variability of 0.016 ± 0.011%. DXA costs were €62, CT €77, and MRI €195. The estimated radiation dose of CT was 2.5-3.0 mSv, for DXA this was 2-4 µSv. CONCLUSIONS: DXA has lower costs and a lower radiation dose, with low inter- and intra-observer variability, compared to CT and MRI for assessing lower limb muscle mass. TRIAL REGISTRATION: Netherlands Trial Register; NL8068. CRITICAL RELEVANCE STATEMENT: DXA is a good alternative for CT and MRI in assessing lower limb muscle mass, with lower costs and lower radiation dose, while inter-observer and intra-observer variability are low. KEY POINTS: • Screening for sarcopenia should be optimized as the population ages. • DXA outperformed CT and MRI in the measured metrics. • DXA validity should be further evaluated as an alternative to CT and MRI for sarcopenia evaluation.

10.
Eur Radiol ; 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38488970

ABSTRACT

BACKGROUND: The Paris classification categorises colorectal polyp morphology. Interobserver agreement for Paris classification has been assessed at optical colonoscopy (OC) but not CT colonography (CTC). We aimed to determine the following: (1) interobserver agreement for the Paris classification using CTC between radiologists; (2) if radiologist experience influenced classification, gross polyp morphology, or polyp size; and (3) the extent to which radiologist classifications agreed with (a) colonoscopy and (b) a combined reference standard. METHODS: Following ethical approval for this non-randomised prospective cohort study, seven radiologists from three hospitals classified 52 colonic polyps using the Paris system. We calculated interobserver agreement using Fleiss kappa and mean pairwise agreement (MPA). Absolute agreement was calculated between radiologists; between CTC and OC; and between CTC and a combined reference standard using all available imaging, colonoscopic, and histopathological data. RESULTS: Overall interobserver agreement between the seven readers was fair (Fleiss kappa 0.33; 95% CI 0.30-0.37; MPA 49.7%). Readers with < 1500 CTC experience had higher interobserver agreement (0.42 (95% CI 0.35-0.48) vs. 0.33 (95% CI 0.25-0.42)) and MPA (69.2% vs 50.6%) than readers with ≥ 1500 experience. There was substantial overall agreement for flat vs protuberant polyps (0.62 (95% CI 0.56-0.68)) with a MPA of 87.9%. Agreement between CTC and OC classifications was only 44%, and CTC agreement with the combined reference standard was 56%. CONCLUSION: Radiologist agreement when using the Paris classification at CT colonography is low, and radiologist classification agrees poorly with colonoscopy. Using the full Paris classification in routine CTC reporting is of questionable value. CLINICAL RELEVANCE STATEMENT: Interobserver agreement for radiologists using the Paris classification to categorise colorectal polyp morphology is only fair; routine use of the full Paris classification at CT colonography is questionable. KEY POINTS: • Overall interobserver agreement for the Paris classification at CT colonography (CTC) was only fair, and lower than for colonoscopy. • Agreement was higher for radiologists with < 1500 CTC experience and for larger polyps. There was substantial agreement when classifying polyps as protuberant vs flat. • Agreement between CTC and colonoscopic polyp classification was low (44%).

11.
Radiother Oncol ; 194: 110196, 2024 May.
Article in English | MEDLINE | ID: mdl-38432311

ABSTRACT

BACKGROUND AND PURPOSE: Studies investigating the application of Artificial Intelligence (AI) in the field of radiotherapy exhibit substantial variations in terms of quality. The goal of this study was to assess the amount of transparency and bias in scoring articles with a specific focus on AI based segmentation and treatment planning, using modified PROBAST and TRIPOD checklists, in order to provide recommendations for future guideline developers and reviewers. MATERIALS AND METHODS: The TRIPOD and PROBAST checklist items were discussed and modified using a Delphi process. After consensus was reached, 2 groups of 3 co-authors scored 2 articles to evaluate usability and further optimize the adapted checklists. Finally, 10 articles were scored by all co-authors. Fleiss' kappa was calculated to assess the reliability of agreement between observers. RESULTS: Three of the 37 TRIPOD items and 5 of the 32 PROBAST items were deemed irrelevant. General terminology in the items (e.g., multivariable prediction model, predictors) was modified to align with AI-specific terms. After the first scoring round, further improvements of the items were formulated, e.g., by preventing the use of sub-questions or subjective words and adding clarifications on how to score an item. Using the final consensus list to score the 10 articles, only 2 out of the 61 items resulted in a statistically significant kappa of 0.4 or more demonstrating substantial agreement. For 41 items no statistically significant kappa was obtained indicating that the level of agreement among multiple observers is due to chance alone. CONCLUSION: Our study showed low reliability scores with the adapted TRIPOD and PROBAST checklists. Although such checklists have shown great value during development and reporting, this raises concerns about the applicability of such checklists to objectively score scientific articles for AI applications. When developing or revising guidelines, it is essential to consider their applicability to score articles without introducing bias.


Subject(s)
Artificial Intelligence , Checklist , Delphi Technique , Radiotherapy Planning, Computer-Assisted , Humans , Radiotherapy Planning, Computer-Assisted/methods , Radiotherapy Planning, Computer-Assisted/standards , Practice Guidelines as Topic , Bias , Reproducibility of Results , Neoplasms/radiotherapy
12.
BMC Med Res Methodol ; 24(1): 61, 2024 Mar 09.
Article in English | MEDLINE | ID: mdl-38461273

ABSTRACT

BACKGROUND: The provision of data sharing statements (DSS) for clinical trials has been made mandatory by different stakeholders. DSS are a device to clarify whether there is intention to share individual participant data (IPD). What is missing is a detailed assessment of whether DSS are providing clear and understandable information about the conditions for data sharing of IPD for secondary use. METHODS: A random sample of 200 COVID-19 clinical trials with explicit DSS was drawn from the ECRIN clinical research metadata repository. The DSS were assessed and classified, by two experienced experts and one assessor with less experience in data sharing (DS), into different categories (unclear, no sharing, no plans, yes but vague, yes on request, yes with specified storage location, yes but with complex conditions). RESULTS: Between the two experts the agreement was moderate to substantial (kappa=0.62, 95% CI [0.55, 0.70]). Agreement considerably decreased when these experts were compared with a third person who was less experienced and trained in data sharing ("assessor") (kappa=0.33, 95% CI [0.25, 0.41]; 0.35, 95% CI [0.27, 0.43]). Between the two experts and under supervision of an independent moderator, a consensus was achieved for those cases, where both experts had disagreed, and the result was used as "gold standard" for further analysis. At least some degree of willingness of DS (data sharing) was expressed in 63.5% (127/200) cases. Of these cases, around one quarter (31/127) were vague statements of support for data sharing but without useful detail. In around half of the cases (60/127) it was stated that IPD could be obtained by request. Only in in slightly more than 10% of the cases (15/127) it was stated that the IPD would be transferred to a specific data repository. In the remaining cases (21/127), a more complex regime was described or referenced, which could not be allocated to one of the three previous groups. As a result of the consensus meetings, the classification system was updated. CONCLUSION: The study showed that the current DSS that imply possible data sharing are often not easy to interpret, even by relatively experienced staff. Machine based interpretation, which would be necessary for any practical application, is currently not possible. Machine learning and / or natural language processing techniques might improve machine actionability, but would represent a very substantial investment of research effort. The cheaper and easier option would be for data providers, data requestors, funders and platforms to adopt a clearer, more structured and more standardised approach to specifying, providing and collecting DSS. TRIAL REGISTRATION: The protocol for the study was pre-registered on ZENODO ( https://zenodo.org/record/7064624#.Y4DIAHbMJD8 ).


Subject(s)
Information Dissemination , Research Design , Humans , Information Dissemination/methods , Consensus , Registries
13.
Pathologie (Heidelb) ; 45(2): 115-123, 2024 Mar.
Article in German | MEDLINE | ID: mdl-38381370

ABSTRACT

BACKGROUND: Metabolic dysfunction-associated steatotic liver disease (MASLD), or non-alcoholic fatty liver disease (NAFLD), is a common disease that is diagnosed through manual evaluation of liver biopsies, an assessment that is subject to high interobserver variability (IBV). IBV can be reduced using automated methods. OBJECTIVES: Many existing computer-based methods do not accurately reflect what pathologists evaluate in practice. The goal is to demonstrate how these differences impact the prediction of hepatic steatosis. Additionally, IBV complicates algorithm validation. MATERIALS AND METHODS: Forty tissue sections were analyzed to detect steatosis, nuclei, and fibrosis. Data generated from automated image processing were used to predict steatosis grades. To investigate IBV, 18 liver biopsies were evaluated by multiple observers. RESULTS: Area-based approaches yielded more strongly correlated results than nucleus-based methods (⌀ Spearman rho [ρ] = 0.92 vs. 0.79). The inclusion of information regarding tissue composition reduced the average absolute error for both area- and nucleus-based predictions by 0.5% and 2.2%, respectively. Our final area-based algorithm, incorporating tissue structure information, achieved a high accuracy (80%) and strong correlation (⌀ Spearman ρ = 0.94) with manual evaluation. CONCLUSION: The automatic and deterministic evaluation of steatosis can be improved by integrating information about tissue composition and can serve to reduce the influence of IBV.


Subject(s)
Non-alcoholic Fatty Liver Disease , Humans , Non-alcoholic Fatty Liver Disease/diagnosis , Biopsy , Fibrosis , Automation
14.
Stat Methods Med Res ; 33(3): 532-553, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38320802

ABSTRACT

Reliability of measurement instruments providing quantitative outcomes is usually assessed by an intraclass correlation coefficient. When participants are repeatedly measured by a single rater or device, or, are each rated by a different group of raters, the intraclass correlation coefficient is based on a one-way analysis of variance model. When planning a reliability study, it is essential to determine the number of participants and measurements per participant (i.e. number of raters or number of repeated measurements). Three different sample size determination approaches under the one-way analysis of variance model were identified in the literature, all based on a confidence interval for the intraclass correlation coefficient. Although eight different confidence interval methods can be identified, Wald confidence interval with Fisher's large sample variance approximation remains most commonly used despite its well-known poor statistical properties. Therefore, a first objective of this work is comparing the statistical properties of all identified confidence interval methods-including those overlooked in previous studies. A second objective is developing a general procedure to determine the sample size using all approaches since a closed-form formula is not always available. This procedure is implemented in an R Shiny app. Finally, we provide advice for choosing an appropriate sample size determination method when planning a reliability study.


Subject(s)
Sample Size , Humans , Reproducibility of Results , Observer Variation , Analysis of Variance
15.
Cancer Med ; 13(2): e6967, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38348960

ABSTRACT

RATIONALE AND OBJECTIVES: Computer-aided detection (CAD) of pulmonary nodules reduces the impact of observer variability, improving the reliability and reproducibility of nodule assessments in clinical practice. Therefore, this study aimed to assess the impact of CAD on inter-observer agreement in the follow-up management of subsolid nodules. MATERIALS AND METHODS: A dataset comprising 60 subsolid nodule cases was constructed based on the National Cancer Center lung cancer screening data. Five observers independently assessed all low-dose computed tomography scans and assigned follow-up management strategies to each case according to the National Comprehensive Cancer Network (NCCN) guidelines, using both manual measurements and CAD assistance. The linearly weighted Cohen's kappa test was used to measure agreement between paired observers. Agreement among multiple observers was evaluated using the Fleiss kappa statistic. RESULTS: The agreement of the five observers for NCCN follow-up management categorization was moderate when measured manually, with a Fleiss kappa score of 0.437. Utilizing CAD led to a notable enhancement in agreement, achieving a substantial consensus with a Fleiss kappa value of 0.623. After using CAD, the proportion of major and substantial management discrepancies decreased from 27.5% to 15.8% and 4.8% to 1.5%, respectively (p < 0.01). In 23 lung cancer cases presenting as part-solid nodules, CAD significantly elevates the average sensitivity in detecting lung cancer cases presenting as part-solid nodules (overall sensitivity, 82.6% vs. 92.2%; p < 0.05). CONCLUSION: The application of CAD significantly improves inter-observer agreement in the follow-up management strategy for subsolid nodules. It also demonstrates the potential to reduce substantial management discrepancies and increase detection sensitivity in lung cancer cases presenting as part-solid nodules.


Subject(s)
Lung Neoplasms , Humans , Lung Neoplasms/diagnostic imaging , Reproducibility of Results , Early Detection of Cancer , Observer Variation , Follow-Up Studies , Computers
16.
Global Spine J ; : 21925682241235607, 2024 Feb 21.
Article in English | MEDLINE | ID: mdl-38382044

ABSTRACT

STUDY DESIGN: Reliability analysis. OBJECTIVES: Vertebral pelvic angles (VPA) are gaining popularity given their ability to describe the shape of the spine. Understanding the reliability and minimal detectable change (MDC) is necessary to determine how these measurement tools should be used in the manual assessment of spine radiographs. Our aim is to assess intra- and interobserver intraclass correlation coefficients (ICC) and the MDC in the use of VPA for assessing alignment in adult spinal deformity (ASD). METHODS: Three independent examiners blindly measured T1, T4, T9, L1, and L4PA twice in ASD patients with a 4-week window after the initial measurements. Patients who had undergone hip or shoulder arthroplasty, fused or transitional vertebrae, or whose hip joints were not visible on radiographs were excluded. Power analysis calculated a minimum sample size of 19. Both intra- and interobserver ICC and MDC, which denotes the smallest detectable change in a true value with 95% confidence, were calculated. RESULTS: Out of the 193 patients, 39 were ultimately included in the study, and 390 measurements were performed by 3 raters. Intraobserver ICC values ranged from .90 to .99. The interobserver ICC was .97, .97, .96, .95, and .92, and the MDC was 5.3°, 5.1°, 4.8°, 4.9°, and 4.1° for T1, T4, T9, L1, and L4PA, respectively. CONCLUSION: All VPAs showed excellent intra- and interobserver reliability, however, the MDC is relatively high compared to typical ranges for VPA values. Therefore, surgeons must be aware that substantial alignment changes may not be detected by a single VPA.

17.
AJR Am J Roentgenol ; 222(5): e2330511, 2024 May.
Article in English | MEDLINE | ID: mdl-38294159

ABSTRACT

BACKGROUND. A paucity of relevant guidelines may lead to pronounced variation among radiologists in issuing recommendations for additional imaging (RAI) for head and neck imaging. OBJECTIVE. The purpose of this article was to explore associations of RAI for head and neck imaging examinations with examination, patient, and radiologist factors and to assess the role of individual radiologist-specific behavior in issuing such RAI. METHODS. This retrospective study included 39,200 patients (median age, 58 years; 21,855 women, 17,315 men, 30 with missing sex information) who underwent 39,200 head and neck CT or MRI examinations, interpreted by 61 radiologists, from June 1, 2021, through May 31, 2022. A natural language processing (NLP) tool with manual review of NLP results was used to identify RAI in report impressions. Interradiologist variation in RAI rates was assessed. A generalized mixed-effects model was used to assess associations between RAI and examination, patient, and radiologist factors. RESULTS. A total of 2943 (7.5%) reports contained RAI. Individual radiologist RAI rates ranged from 0.8% to 22.0% (median, 7.1%; IQR, 5.2-10.2%), representing a 27.5-fold difference between minimum and a maximum values and 1.8-fold difference between 25th and 75th percentiles. In multivariable analysis, RAI likelihood was higher for CTA than for CT examinations (OR, 1.32), for examinations that included a trainee in report generation (OR, 1.23), and for patients with self-identified race of Black or African American versus White (OR, 1.25); was lower for male than female patients (OR, 0.90); and was associated with increasing patient age (OR, 1.09 per decade) and inversely associated with radiologist years since training (OR, 0.90 per 5 years). The model accounted for 10.9% of the likelihood of RAI. Of explainable likelihood of RAI, 25.7% was attributable to examination, patient, and radiologist factors; 74.3% was attributable to radiologist-specific behavior. CONCLUSION. Interradiologist variation in RAI rates for head and neck imaging was substantial. RAI appear to be more substantially associated with individual radiologist-specific behavior than with measurable systemic factors. CLINICAL IMPACT. Quality improvement initiatives, incorporating best practices for incidental findings management, may help reduce radiologist preference-sensitive decision-making in issuing RAI for head and neck imaging and associated care variation.


Subject(s)
Magnetic Resonance Imaging , Tomography, X-Ray Computed , Humans , Male , Female , Middle Aged , Retrospective Studies , Tomography, X-Ray Computed/methods , Aged , Magnetic Resonance Imaging/methods , Adult , Head and Neck Neoplasms/diagnostic imaging , Observer Variation , Head/diagnostic imaging , Radiologists , Neck/diagnostic imaging , Practice Patterns, Physicians'/statistics & numerical data , Practice Guidelines as Topic
18.
J Int Neuropsychol Soc ; 30(5): 448-453, 2024 06.
Article in English | MEDLINE | ID: mdl-38263747

ABSTRACT

OBJECTIVE: Self- and informant-ratings of functional abilities are used to diagnose mild cognitive impairment (MCI) and are commonly measured in clinical trials. Ratings are assumed to be accurate, yet they are subject to biases. Biases in self-ratings have been found in individuals with dementia who are older and more depressed and in caregivers with higher distress, burden, and education. This study aimed to extend prior findings using an objective approach to identify determinants of bias in ratings. METHOD: Participants were 118 individuals with MCI and their informants. Three discrepancy variables were generated including the discrepancies between (1) self- and informant-rated functional status, (2) informant-rated functional status and objective cognition (in those with MCI), and (3) self-rated functional status and objective cognition. These variables served as dependent variables in forward linear regression models, with demographics, stress, burden, depression, and self-efficacy as predictors. RESULTS: Informants with higher stress rated individuals with MCI as having worse functional abilities relative to objective cognition. Individuals with MCI with worse self-efficacy rated their functional abilities as being worse compared to objective cognition. Informant-ratings were worse than self-ratings for informants with higher stress and individuals with MCI with higher self-efficacy. CONCLUSION: This study highlights biases in subjective ratings of functional abilities in MCI. The risk for relative underreporting of functional abilities by individuals with higher stress levels aligns with previous research. Bias in individuals with MCI with higher self-efficacy may be due to anosognosia. Findings have implications for the use of subjective ratings for diagnostic purposes and as outcome measures.


Subject(s)
Cognitive Dysfunction , Humans , Cognitive Dysfunction/physiopathology , Cognitive Dysfunction/etiology , Cognitive Dysfunction/diagnosis , Male , Female , Aged , Aged, 80 and over , Self Report , Self Efficacy , Diagnostic Self Evaluation , Middle Aged , Neuropsychological Tests , Bias , Activities of Daily Living , Caregivers , Stress, Psychological/physiopathology
20.
Arch Orthop Trauma Surg ; 144(3): 1149-1159, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38231206

ABSTRACT

INTRODUCTION: Despite being the most used exam today, few studies have evaluated the accuracy of findings on non-contrast magnetic resonance imaging (MRI). The primary objective of the study was to evaluate the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of non-contrast MRI findings in frozen shoulder, isolated and in combination. The secondary objectives were to define the interobserver and intraobserver agreement of the assessments and the odds ratio for frozen shoulder because of the various findings of MRI. METHODS: A retrospective diagnostic accuracy study comparing non-contrast MRI findings between the frozen shoulder group and the control group. Sensitivity, specificity, positive and negative predictive value, accuracy, odds ratio, interobserver and intraobserver agreement were calculated for each finding and their possible associations. RESULTS: The hyperintensity on capsule in the axillary recess presented 84% sensitivity, 94% specificity, and 89% accuracy. The obliteration of the subcoracoid fat triangle in the rotator interval had sensitivity 34%, specificity 82% and accuracy 58%. For coracohumeral ligament thickness ≥ 2 mm had specificity 66%, 48% specificity and 57% accuracy. Capsule thickness in the axillary recess ≥ 4 mm resulted in 54% sensitivity, 82% specificity, and 68% accuracy. Regarding interobserver agreement, only the posteroinferior and posterosuperior quadrants showed moderate results, and all the others showed strong reliability. The odds ratio for hyperintensity in the axillary recess was 82.3 for frozen shoulder. The association of these findings increased specificity (95%). CONCLUSION: The accuracy of non-contrast magnetic resonance imaging is high for diagnosing frozen shoulder, especially when evaluating the hyperintensity of the axillary recess. The exam has high reliability and reproducibility. The presence of an association of signs increases the specificity of the test. LEVEL OF EVIDENCE: Level III, study of diagnostic test.


Subject(s)
Bursitis , Shoulder Joint , Humans , Retrospective Studies , Reproducibility of Results , Shoulder Joint/pathology , Magnetic Resonance Imaging/methods , Bursitis/diagnostic imaging , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...