Search | VHL Regional Portal

1.

Assessing accuracy and consistency in intracranial aneurysm sizing: human expertise vs. artificial intelligence.

Planinc, Andrej; Spegel, Nina; Podobnik, Zala; Sinigoj, Uros; Skubic, Petra; Choi, June Ho; Park, Wonhyoung; Robic, Tina; Tabor, Nika; Jarabek, Leon; Spiclin, Ziga; Bizjak, Ziga.

Sci Rep ; 14(1): 16080, 2024 Jul 12.

Article in English | MEDLINE | ID: mdl-38992041

ABSTRACT

Intracranial aneurysms (IAs) are a common vascular pathology and are associated with a risk of rupture, which is often fatal. Aneurysm growth of more than 1 mm is considered a surrogate of rupture risk, therefore, this study presents a comprehensive analysis of intracranial aneurysm measurements utilizing a dataset comprising 358 IA from 248 computed tomography angiography (CTA) scans measured by four junior raters and one senior rater. The study explores the variability in sizing assessments by employing both human raters and an Artificial Intelligence (AI) system. Our findings reveal substantial inter- and intra-rater variability among junior raters, contrasting with the lower intra-rater variability observed in the senior rater. Standard deviations of all raters were above the threshold for IA growth (1 mm). Additionally, the study identifies a systemic bias, indicating a tendency for human experts to measure aneurysms smaller than the AI system. Our findings emphasize the challenges in human assessment while also showcasing the capacity of AI technology to improve the precision and reliability of intracranial aneurysm assessments, especially beneficial for junior raters. The potential of AI was particularly evident in the task of monitoring IA at various intervals, where the AI-based approach surpassed junior raters and achieved performance comparable to senior raters.

Subject(s)

Artificial Intelligence , Computed Tomography Angiography , Intracranial Aneurysm , Humans , Intracranial Aneurysm/diagnostic imaging , Intracranial Aneurysm/pathology , Male , Female , Computed Tomography Angiography/methods , Middle Aged , Aged , Reproducibility of Results , Observer Variation

2.

Inter-observer reliability and anatomical landmarks for arm circumference to determine cuff size for blood pressure measurement.

Oguaju, Bonaventure; Lau, Darren; Padwal, Raj; Ringrose, Jennifer.

J Clin Hypertens (Greenwich) ; 26(7): 867-871, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38980266

ABSTRACT

Accurate arm circumference (AC) measurement is required for accurate blood pressure (BP) readings. Standards stipulate measuring arm circumference at the midpoint between the acromion process (AP) and the olecranon process. However, which part of the AP to use is not stipulated. Furthermore, BP is measured sitting but arm circumference is measured standing. We sought to understand how landmarking during AC measurement and body position affect cuff size selection. Two variations in measurement procedure were studied. First, AC was measured at the top of the acromion (TOA) and compared to the spine of the acromion (SOA). Second, standing versus seated measurements using each landmark were compared. AC was measured to the nearest 0.1 cm at the mid-point of the upper arm by two independent observers, blinded from each other's measurements. In 51 participants, the mean (±SD) mid-AC measurement using the anchoring landmarks TOA and SOA in the standing position were 32.4 cm (±6.18) and 32.1 cm (±6.07), respectively (mean difference of 0.3 cm). In the seated position, mean arm circumference was 32.2 (±6.10) using TOA and 31.1 (±6.03) using SOA (mean difference 1.1 cm). Kappa agreement for cuff selection in the standing position between TOA and SOA was 0.94 (p < 0.001). The landmark on the acromion process can change the cuff selection in a small percentage of cases. The overall impact of this landmark selection is small. However, standardizing landmark selection and body position for AC measurement could further reduce variability in cuff size selection during BP measurement and validation studies.

Subject(s)

Arm , Blood Pressure Determination , Humans , Arm/anatomy & histology , Male , Female , Blood Pressure Determination/methods , Blood Pressure Determination/instrumentation , Blood Pressure Determination/standards , Reproducibility of Results , Middle Aged , Adult , Observer Variation , Blood Pressure/physiology , Anatomic Landmarks , Aged , Posture/physiology , Anthropometry/methods , Acromion/anatomy & histology

3.

Mesonephric-type adenocarcinomas of the ovary: prevalence, diagnostic reproducibility, outcome, and value of PAX2.

Köbel, Martin; Kang, Eun Young; Lee, Sandra; Ogilvie, Travis; Terzic, Tatjana; Wang, Linyuan; Wiebe, Nicholas Jp; Al-Shamma, Zainab; Cook, Linda S; Nelson, Gregg S; Stewart, Colin Jr; von Deimling, Andreas; Kommoss, Felix Kf; Lee, Cheng-Han.

J Pathol Clin Res ; 10(4): e12389, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38970797

ABSTRACT

Mesonephric-type (or -like) adenocarcinomas (MAs) of the ovary are an uncommon and aggressive histotype. They appear to arise through transdifferentiation from Müllerian lesions creating diagnostic challenges. Thus, we aimed to develop a histologic and immunohistochemical (IHC) approach to optimize the identification of MA over its histologic mimics, such as ovarian endometrioid carcinoma (EC). First, we screened 1,537 ovarian epithelial neoplasms with a four-marker IHC panel of GATA3, TTF1, ER, and PR followed by a morphological review of EC to identify MA in retrospective cohorts. Interobserver reproducibility for the distinction of MA versus EC was assessed in 66 cases initially without and subsequently with IHC information (four-marker panel). Expression of PAX2, CD10, and calretinin was evaluated separately, and survival analyses were performed. We identified 23 MAs from which 22 were among 385 cases initially reported as EC (5.7%) and 1 as clear cell carcinoma. The interobserver reproducibility increased from fair to substantial (κ = 0.376-0.727) with the integration of the four-marker IHC panel. PAX2 was the single most sensitive and specific marker to distinguish MA from EC and could be used as a first-line marker together with ER/PR and GATA3/TTF1. Patients with MA had significantly increased risk of earlier death from disease (hazard ratio = 3.08; 95% CI, 1.62-5.85; p < 0.0001) compared with patients with EC, when adjusted for age, stage, and p53 status. A diagnosis of MA has prognostic implications for stage I disease, and due to the subtlety of morphological features in some tumors, a low threshold for ancillary testing is recommended.

Subject(s)

Biomarkers, Tumor , Ovarian Neoplasms , PAX2 Transcription Factor , Humans , Female , Ovarian Neoplasms/pathology , Ovarian Neoplasms/diagnosis , Ovarian Neoplasms/mortality , PAX2 Transcription Factor/analysis , PAX2 Transcription Factor/metabolism , Biomarkers, Tumor/analysis , Middle Aged , Reproducibility of Results , Aged , Adult , Retrospective Studies , Prevalence , Immunohistochemistry , Adenocarcinoma/pathology , Adenocarcinoma/diagnosis , Adenocarcinoma/mortality , Diagnosis, Differential , Observer Variation , Aged, 80 and over , Carcinoma, Endometrioid/pathology , Carcinoma, Endometrioid/diagnosis , Carcinoma, Endometrioid/mortality

4.

Interexaminer agreement among pediatric dental specialists in assessment of tonsil size, Friedman tongue position, and Friedman staging of obstructive sleep apnea in children: An observational study.

Nair, Lekshmy S R; George, Sageena; Anandaraj, S; Anuja, S; Naveena, T V; Aishwarya, U.

J Indian Soc Pedod Prev Dent ; 42(2): 91-97, 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38957905

ABSTRACT

BACKGROUND: The evaluation of tonsil size, Friedman Tongue Position (FTP), and Friedman staging in pediatric obstructive sleep apnea (OSA) holds significant clinical importance, offering manifold advantages in diagnosis and surgical management. AIMS AND OBJECTIVES: This study aimed to assess the reliability of pediatric OSA evaluation by determining inter-examiner agreement among pediatric dental specialists. MATERIALS AND METHODS: Conducted at the Department of Pediatric Dentistry, PMS College of Dental Science and Research Hospital (2023-2024), this observational study utilized conventional consulting rooms, headlights, and examination chairs. Thirteen medical practitioners reviewed video recordings of the oropharyngeal regions of twelve pediatric patients exhibiting mouth breathing. Friedman staging was determined based on tonsil size and tongue position gradings.Inter-examiner agreement was evaluated using Fleiss kappa analysis. RESULTS: Observers, including residents and practitioners in pediatric dentistry, demonstrated poor agreement regarding FTP and tonsil grading. CONCLUSION: Understanding the nuances of tonsil size and FTP in pediatric OSA evaluation, along with identifying avenues for refinement, can enhance medical decision-making among healthcare providers, including pediatric dentists.

Subject(s)

Observer Variation , Palatine Tonsil , Pediatric Dentistry , Sleep Apnea, Obstructive , Tongue , Humans , Sleep Apnea, Obstructive/diagnosis , Palatine Tonsil/pathology , Child , Male , Tongue/pathology , Female , Reproducibility of Results , Child, Preschool

5.

A comparison of target volumes drawn on arterial and venous phase scans during radiation therapy planning for patients with pancreatic cancer: the PANCRINJ study.

Zaidi, Fabien; Calame, Paul; Chevalier, Cédric; Henriques, Julie; Vernerey, Dewi; Vuitton, Lucine; Heyd, Bruno; Borg, Christophe; Boustani, Jihane.

Radiat Oncol ; 19(1): 90, 2024 Jul 15.

Article in English | MEDLINE | ID: mdl-39010133

ABSTRACT

BACKGROUND: The planification of radiation therapy (RT) for pancreatic cancer (PC) requires a dosimetric computed tomography (CT) scan to define the gross tumor volume (GTV). The main objective of this study was to compare the inter-observer variability in RT planning between the arterial and the venous phases following intravenous contrast. METHODS: PANCRINJ was a prospective monocentric study that included twenty patients with non-metastatic PC. Patients underwent a pre-therapeutic CT scan at the arterial and venous phases. The delineation of the GTV was performed by one radiologist (gold standard) and two senior radiation oncologists (operators). The primary objective was to compare the Jaccard conformity index (JCI) for the GTVs computed between the GS (gold standard) and the operators between the arterial and the venous phases with a Wilcoxon signed rank test for paired samples. The secondary endpoints were the geographical miss index (GMI), the kappa index, the intra-operator variability, and the dose-volume histograms between the arterial and venous phases. RESULTS: The median JCI for the arterial and venous phases were 0.50 (range, 0.17-0.64) and 0.41 (range, 0.23-0.61) (p = 0.10) respectively. The median GS-GTV was statistically significantly smaller compared to the operators at the arterial (p < 0.0001) and venous phases (p < 0.001), respectively. The GMI were low with few tumors missed for all patients with a median GMI of 0.07 (range, 0-0.79) and 0.05 (range, 0-0.39) at the arterial and venous phases, respectively (p = 0.15). There was a moderate agreement between the radiation oncologists with a median kappa index of 0.52 (range 0.38-0.57) on the arterial phase, and 0.52 (range 0.36-0.57) on the venous phase (p = 0.08). The intra-observer variability for GTV delineation was lower at the venous phase than at the arterial phase for the two operators. There was no significant difference between the arterial and the venous phases regarding the dose-volume histogram for the operators. CONCLUSIONS: Our results showed inter- and intra-observer variability in delineating GTV for PC without significant differences between the arterial and the venous phases. The use of both phases should be encouraged. Our findings suggest the need to provide training for radiation oncologists in pancreatic imaging and to collaborate within a multidisciplinary team.

Subject(s)

Pancreatic Neoplasms , Radiotherapy Planning, Computer-Assisted , Tomography, X-Ray Computed , Humans , Pancreatic Neoplasms/radiotherapy , Pancreatic Neoplasms/diagnostic imaging , Pancreatic Neoplasms/pathology , Radiotherapy Planning, Computer-Assisted/methods , Prospective Studies , Male , Female , Aged , Middle Aged , Tomography, X-Ray Computed/methods , Radiotherapy Dosage , Aged, 80 and over , Observer Variation , Tumor Burden

6.

Validation of immunohistochemical overexpression of p16 in the histologic diagnosis of cervical intraepithelial neoplasia grade 2.

Forteza, Ana; Vanrell, Cristina; Matheu, Gabriel; Cortés, Javier.

Rev Esp Patol ; 57(3): 169-175, 2024.

Article in English | MEDLINE | ID: mdl-38971616

ABSTRACT

An accurate cytohistologic diagnosis is important to avoid overtreatment of cervical intraepithelial lesions. The three-tiered Cervical Intraepithelial Neoplasia (CIN) classification, grades 1, 2 and 3, despite poor agreement among pathologists in diagnosing CIN2, is still being used. The College of American Pathologists recommended an alternative two-tiered classification that has not yet been universally accepted. We review the diagnostic results of 286 biopsies performed by three pathologists using haematoxylin and eosin (H&E) and p16 to establish the level of agreement among the readers. Agreement between pathologists in diagnosing CIN2 with H&E was around 45% and improved to 86.7% when interpreting p16 stained biopsies without H&E; agreement with pathologist 3 was lower, around 60%. Discrepant results from one pathologist when assessing p16 highlights the decisive influence of individual criteria. P16 has shown to improve agreement between pathologists with previous good agreement, but did not correct it for the third pathologist. In equivocal cases, protein p16 is a useful conjunctive tool for a histologic diagnosis.

Subject(s)

Cyclin-Dependent Kinase Inhibitor p16 , Immunohistochemistry , Uterine Cervical Dysplasia , Uterine Cervical Neoplasms , Humans , Uterine Cervical Dysplasia/pathology , Uterine Cervical Dysplasia/diagnosis , Female , Cyclin-Dependent Kinase Inhibitor p16/analysis , Uterine Cervical Neoplasms/pathology , Uterine Cervical Neoplasms/chemistry , Uterine Cervical Neoplasms/diagnosis , Neoplasm Grading , Biomarkers, Tumor/analysis , Biopsy , Observer Variation , Reproducibility of Results

7.

Interobserver Agreement and Performance of Concurrent AI Assistance for Radiographic Evaluation of Knee Osteoarthritis.

Brejnebøl, Mathias W; Lenskjold, Anders; Ziegeler, Katharina; Ruitenbeek, Huib; Müller, Felix C; Nybing, Janus U; Visser, Jacob J; Schiphouwer, Loes M; Jasper, Jorrit; Bashian, Behschad; Cao, Haoyin; Muellner, Maximilian; Dahlmann, Sebastian A; Radev, Dimitar I; Ganestam, Ann; Nielsen, Camilla T; Stroemmen, Carsten U; Oei, Edwin H G; Hermann, Kay-Geert A; Boesen, Mikael.

Radiology ; 312(1): e233341, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38980184

ABSTRACT

Background Due to conflicting findings in the literature, there are concerns about a lack of objectivity in grading knee osteoarthritis (KOA) on radiographs. Purpose To examine how artificial intelligence (AI) assistance affects the performance and interobserver agreement of radiologists and orthopedists of various experience levels when evaluating KOA on radiographs according to the established Kellgren-Lawrence (KL) grading system. Materials and Methods In this retrospective observer performance study, consecutive standing knee radiographs from patients with suspected KOA were collected from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels. KL grading (KL-0 = no KOA, KL-4 = severe KOA) on the frontal view was assessed by readers with and without assistance from a commercial AI tool. The majority vote of three musculoskeletal radiology consultants established the reference standard. The ordinal receiver operating characteristic method was used to estimate grading performance. Light kappa was used to estimate interrater agreement, and bootstrapped t statistics were used to compare groups. Results Seventy-five studies were included from each center, totaling 225 studies (mean patient age, 55 years ± 15 [SD]; 113 female patients). The KL grades were KL-0, 24.0% (n = 54); KL-1, 28.0% (n = 63); KL-2, 21.8% (n = 49); KL-3, 18.7% (n = 42); and KL-4, 7.6% (n = 17). Eleven readers completed their readings. Three of the six junior readers showed higher KL grading performance with versus without AI assistance (area under the receiver operating characteristic curve, 0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]; 0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]; and 0.89 ± 0.011 vs 0.91 ± 0.009 [P = .008]). Interobserver agreement for KL grading among all readers was higher with versus without AI assistance (κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001). Board-certified radiologists achieved almost perfect agreement for KL grading when assisted by AI (κ = 0.90 ± 0.01), which was higher than that achieved by the reference readers independently (κ = 0.84 ± 0.017; P = .01). Conclusion AI assistance increased junior readers' radiographic KOA grading performance and increased interobserver agreement for osteoarthritis grading across all readers and experience levels. Published under a CC BY 4.0 license. Supplemental material is available for this article.

Subject(s)

Artificial Intelligence , Observer Variation , Osteoarthritis, Knee , Humans , Female , Male , Osteoarthritis, Knee/diagnostic imaging , Middle Aged , Retrospective Studies , Radiography/methods , Aged

8.

Inter- and Intrarater Reliability of the Gap-Kalamazoo Communication Skills Assessment Form Among Occupational Therapy Interns.

Chen, Tzu-Ting; Wang, Yi-Ching; Wu, Tzu-Yi; Chen, Chyi-Rong; Cheng, Chung-Yin; Hsueh, I-Ping; Wang, San-Ping; Hsieh, Ching-Lin.

Am J Occup Ther ; 78(4)2024 Jul 01.

Article in English | MEDLINE | ID: mdl-38885526

ABSTRACT

IMPORTANCE: Effective communication skills (CS) are essential for occupational therapists. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF) is a standard tool for assessing the CS of medical residents. However, the interrater reliability for the nine CS domain scores ranges from poor to good. The intrarater reliability remains unclear. OBJECTIVE: To examine the inter- and intrarater reliability of the GKCSAF's nine domain scores and total score among occupational therapy interns. DESIGN: Repeated assessments with the GKCSAF. SETTING: Medical center psychiatry department. PARTICIPANTS: Twenty-five interns and 49 clients with mental illness, recruited from August 2020 to December 2021. OUTCOMES AND MEASURES: The transcripts of 50 evaluation interviews between clients and interns were used. Three independent raters assessed each transcript twice, at least 3 mo apart. RESULTS: The GKCSAF demonstrated poor interrater reliability for the nine domain scores (weighted κ = .08-.30) and the total score (intraclass correlation coefficient [ICC] = .22, 95% confidence interval [CI] [.10, .35]). The GKCSAF showed poor to intermediate intrarater reliability for the nine domain scores (weighted κ = .27-.73) and fair reliability for the total score (ICC = .69, 95% CI [.60, .77]). CONCLUSIONS AND RELEVANCE: The GKCSAF demonstrates poor interrater reliability and poor to intermediate intrarater reliability for the nine domain scores. However, it demonstrates fair intrarater reliability in assessing the overall CS performance of occupational therapy interns. Significant variations were observed when different raters assessed the same interns' CS, indicating inconsistencies in ratings. Consequently, it is advisable to conservatively interpret the CS ratings obtained with the GKCSAF. Plain-Language Summary: It is essential for occupational therapists to effectively communicate with clients. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF) is a standard tool that is used to assess the communication skills of medical residents. The study authors used the GKCSAF with occupational therapy interns in a medical center psychiatry department to assess how effectively they interviewed clients with mental illness. This study aids occupational therapy personnel in the interpretation of GKCSAF results. The study findings also highlight the importance of developing reliable and standardized measures to assess communications skills in the field of occupational therapy.

Subject(s)

Clinical Competence , Communication , Internship and Residency , Occupational Therapy , Humans , Occupational Therapy/education , Reproducibility of Results , Male , Female , Adult , Observer Variation , Professional-Patient Relations , Mental Disorders/rehabilitation

9.

Intrarater and interrater reliability of digital calipers in assessing Achilles tendon thickness in patients with knee osteoarthritis.

Nazir, Shaikh Nabi Bukhsh; Ansari, Basit.

Physiother Res Int ; 29(3): e2107, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38873741

ABSTRACT

OBJECTIVE: This study aimed to evaluate the intrarater and interrater reliability of measuring Achilles tendon (AT) thickness using a digital caliper in patients with knee osteoarthritis. METHODS: A cross-sectional survey was conducted at the Physiotherapy Department of Rabia Moon Hospital, involving the recruitment of 61 patients with knee osteoarthritis. Measurements were taken in millimeters at a 90-degree angle, approximately 5 cm from the attachment to the calcaneus, precisely where the ankle joint joins the medial malleolus. Two physical therapists conducted two testing sessions, separated by 7 days, to assess both the intrarater and interrater reliability of the digital caliper. During the second session, two raters simultaneously assessed the patients' responses on the digital caliper. The study analyzed reliability indices, including the Intraclass Correlation Coefficient (ICC) and Bland-Altman plot. RESULTS: The study found high intrarater reliability for the digital caliper, with an ICC of 0.96 (95% confidence interval: 0.22, 0.99). For interrater reliability, the ICC was 0.98 (95% CI: 0.96, 0.98) in patients with knee OA. Additionally, both interrater and intrarater agreement for measuring AT thickness with the digital caliper fell within acceptable limits on 95% of occasions, as indicated by the Limits of Agreement values: 0.32 to -0.53 mm for interrater agreement and -0.35 to -0.04 mm for intrarater agreement. CONCLUSIONS: Digital Calipers have been found to provide excellent intrarater and interrater reliability when used to measure AT thickness in patients with knee osteoarthritis (OA).

Subject(s)

Achilles Tendon , Observer Variation , Osteoarthritis, Knee , Humans , Male , Female , Cross-Sectional Studies , Reproducibility of Results , Middle Aged , Aged

10.

A comparison of visual and direct assessments of lumbar spine posture.

Harvie, Daniel S; McEvoy, Maureen; Tomkinson, Grant R.

J Bodyw Mov Ther ; 39: 209-213, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38876627

ABSTRACT

BACKGROUND: Posture is assessed clinically and used to guide treatment of low back pain. Collectively, the relevance of posture and clinical postural assessments have come under scrutiny. This study aimed to determine (a) the intra-rater and inter-rater reliability of visual assessments of lumbar lordosis, and (b) the agreement between visual and direct postural assessments. METHODS: Ten physiotherapists visually assessed the lumbar lordosis from 3D scans of 50 asymptomatic participants, and 15 duplicates, using a grading scale of deviations (range: 0 = normal to 3 = severe). Lumbar lordosis angle was directly assessed using the Vitus Smart 3D whole body scanner. Cohen's Kappa was used to determine the intra-rater and inter-rater reliability of visual assessments, with polyserial correlation (ps) used to determine the agreement between visual and direct assessments. RESULTS: Overall, 93% and 83% of all intra-rater and inter-rater differences in visual assessments were within a single grade point, respectively. The intra-rater and inter-rater reliability of visual assessments was moderate (κ (95%CI): 0.56 (0.45, 0.67)) and slight (κ (95%CI): 0.13 (0.08, 0.19)), respectively. The agreement between visual and direct assessments was moderate (ps = -0.41, p = 0.04). CONCLUSION: Visual assessments of lumbar posture demonstrated moderate repeatability and agreement with quantitative assessments. While agreement between assessors was slight, 83% of the visual ratings were within a single grade point, suggesting greater coherence among clinicians than our statistics suggested. As with any clinical assessments involving uncertainty, postural assessment should not solely guide treatment.

Subject(s)

Lordosis , Lumbar Vertebrae , Observer Variation , Posture , Humans , Posture/physiology , Female , Lumbar Vertebrae/physiology , Lumbar Vertebrae/physiopathology , Male , Adult , Lordosis/physiopathology , Reproducibility of Results , Young Adult , Low Back Pain/physiopathology , Middle Aged , Imaging, Three-Dimensional/methods

11.

Associations Between Radiation Oncologist Demographic Factors and Segmentation Similarity Benchmarks: Insights From a Crowd-Sourced Challenge Using Bayesian Estimation.

Wahid, Kareem A; Sahin, Onur; Kundu, Suprateek; Lin, Diana; Alanis, Anthony; Tehami, Salik; Kamel, Serageldin; Duke, Simon; Sherer, Michael V; Rasmussen, Mathis; Korreman, Stine; Fuentes, David; Cislo, Michael; Nelms, Benjamin E; Christodouleas, John P; Murphy, James D; Mohamed, Abdallah S R; He, Renjie; Naser, Mohammed A; Gillespie, Erin F; Fuller, Clifton D.

JCO Clin Cancer Inform ; 8: e2300174, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38870441

ABSTRACT

PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.

Subject(s)

Bayes Theorem , Benchmarking , Radiation Oncologists , Humans , Benchmarking/methods , Female , Radiotherapy Planning, Computer-Assisted/methods , Neoplasms/epidemiology , Neoplasms/radiotherapy , Organs at Risk , Male , Radiation Oncology/standards , Radiation Oncology/methods , Demography , Observer Variation

12.

Adapting the Cornell assessment of pediatric delirium for Swedish context: translation, cultural validation and inter-rater reliability.

Åkerman, Sara; Axelin, Anna; Traube, Chani; Frithiof, Robert; Thernström Blomqvist, Ylva.

BMC Pediatr ; 24(1): 413, 2024 Jun 26.

Article in English | MEDLINE | ID: mdl-38926708

ABSTRACT

BACKGROUND: Pediatric delirium causes prolonged hospital stays, increased costs, and distress for children and caregivers. Currently, there is no delirium screening tool available in Sweden that has been translated, culturally validated, and tested for reliability. This study aimed to translate, culturally adapt, and assess the suitability of the Cornell Assessment of Pediatric Delirium (CAPD) for implementation in Swedish healthcare settings. METHODS: The CAPD was translated and culturally adapted to Swedish context following the ten-step process recommended by the International Society for Pharmacoeconomics and Outcomes Task Force for Translation and Cultural Adaptation. The Swedish CAPD was tested in the pediatric intensive care unit of Uppsala University Hospital, a tertiary hospital in Sweden. Inter-rater reliability was tested using intraclass correlation coefficient (ICC), with both Registered Nurses (RNs) and Assistant Nurses (ANs) conducting parallel measurements using the Swedish CAPD. A reliability score of ICC > 0.75 was considered indicative of good reliability. RESULTS: After translation of the CAPD into Swedish, 10 RNs participated in the cultural adaptation process. Issues related to word choice, education, and instructions were addressed. Wording improvements were made to ensure accurate interpretation. Supplementary training sessions were organized to strengthen users' proficiency with the Swedish CAPD. Additional instructions were provided to enhance clarity and usability. Inter-rater reliability testing resulted in an ICC of 0.857 (95% CI: 0.708-0.930), indicating good reliability. CONCLUSION: This study successfully translated and culturally adapted the CAPD to align with Swedish contextual parameters. The resulting Swedish CAPD demonstrated good inter-rater reliability, establishing its viability as a tool for measuring delirium among pediatric patients in Swedish pediatric intensive care units. TRAIL REGISTRATION: Not applicable.

Subject(s)

Delirium , Translations , Humans , Sweden , Delirium/diagnosis , Reproducibility of Results , Child , Intensive Care Units, Pediatric , Male , Female , Observer Variation , Child, Preschool , Translating

13.

Disagreements in risk of bias assessment for randomized controlled trials in hypertension-related Cochrane reviews.

Yao, Yi; Shen, Jing; Luo, Jianzhao; Li, Nian; Liao, Xiaoyang; Zhang, Yonggang.

Trials ; 25(1): 405, 2024 Jun 21.

Article in English | MEDLINE | ID: mdl-38907276

ABSTRACT

BACKGROUND: The inter-reviewer reliability of the risk of bias (RoB) assessment lacked agreement in previous studies. It is important to analyse these disagreements to improve the repeatability of RoB assessment. The objective of the study was to evaluate the frequency and reasons for disagreements in RoB assessments for randomised controlled trials (RCTs) that were included in multiple Cochrane reviews in the field of hypertension. METHODS: A cross-sectional study was employed. We retrieved any RCTs that had been included in multiple Cochrane reviews in the field of hypertension from ARCHIE. The results of the RoB assessments were extracted, and the distributions of agreements and possible reasons for disagreement were analyzed. RESULTS: Twenty-six Cochrane reviews were included in this study. A total of 78 RCTs appeared in more than one Cochrane review. The level of agreement ranged from domain to domain. "Blinding of outcome assessment" showed a reasonably high level of agreement (94.9%), while "incomplete outcome data", "selective outcome reporting" and "other sources of bias" showed moderate levels of agreement (74.6%, 79.2% and 75.6%, respectively). However, the domains of "allocation concealment", "random sequence generation" and "blinding of participants and personnel" showed low levels of agreement (24.4%, 23.5%, and 47.4%, respectively). In the domains of "allocation concealment" and "blinding of participants and personnel", the agreement group had higher proportion of publication year ≤ 1996 than the disagreement group (P = 0.008 and P < 0.001, respectively). In the "blinding of participants and personnel", the impact factor was higher in the agreement group (P < 0.001). By analyzing the support text, we found that the most likely reason for disagreement was extracting different information from the same RCT. CONCLUSION: For Cochrane reviews in the field of hypertension using the 2011 version of the RoB tool, there was a large disagreement in the RoB assessment. It is suggested that the results of RoB assessments in systematic reviews that used the 2011 version of the RoB tool need to be interpreted with caution. More accurate information from RCTs needs to be collected when we synthesize clinical evidence.

Subject(s)

Bias , Hypertension , Randomized Controlled Trials as Topic , Humans , Hypertension/diagnosis , Cross-Sectional Studies , Review Literature as Topic , Research Design , Risk Assessment , Observer Variation , Reproducibility of Results , Treatment Outcome , Risk Factors

14.

Sonographic imaging of the stellate ganglion in healthy adults: An observational study.

Bedewi, Mohamed A; Marsico, Salvatore; Soliman, Steven B; Habib, Yomna S; Kotb, Mamdouh Ali; Almalki, Daifallah Mohammed; AlAseeri, Ali Abdullah; Alhariqi, Bader A; Alqahtani, Mohammed Saad; Albarrak, Anas Mohammad; Alamir, Ahmed Y.

Medicine (Baltimore) ; 103(25): e38646, 2024 Jun 21.

Article in English | MEDLINE | ID: mdl-38905380

ABSTRACT

The aim of this study is to estimate the normal cross-sectional area and diameter of the stellate ganglion (SG) by ultrasound (US) in healthy adults. The study sample included 80 stellate ganglia in 40 participants (15 males, 25 females), mean age 38 years, mean height 162.5 cm, mean weight 67.8 kg, mean body mass index 25.4 kg/m2. Two radiologists separately obtained US images of the bilateral SG. Each participant was scanned 3 times bilaterally to assess for intra-observer reliability. The mean diameter of the SG was 1 mm (range: 0.1-2). The mean CSA of the bilateral SG was 1.3 mm2 (range: 0.6-3.9). The SG diameter positively correlated with age. Our study demonstrates the ability of US to image the SG and estimate its normal diameter and CSA. Knowledge of how to identify and measure the SG during ultrasound-guided procedures would be expected to decrease the risk of associated complications and help establish normal reference values.

Subject(s)

Stellate Ganglion , Ultrasonography , Humans , Male , Female , Adult , Stellate Ganglion/diagnostic imaging , Ultrasonography/methods , Middle Aged , Reference Values , Healthy Volunteers , Young Adult , Reproducibility of Results , Observer Variation

15.

²³Na MRI: inter-reader reproducibility of normal fibroglandular sodium concentration measurements at 3 T.

Arponen, Otso; McLean, Mary A; Nanaa, Muzna; Manavaki, Roido; Baxter, Gabrielle C; Gill, Andrew B; Riemer, Frank; Kennerley, Aneurin J; Woitek, Ramona; Kaggie, Joshua D; Brackenbury, William J; Gilbert, Fiona J.

Eur Radiol Exp ; 8(1): 75, 2024 Jun 10.

Article in English | MEDLINE | ID: mdl-38853182

ABSTRACT

BACKGROUND: To study the reproducibility of 23Na magnetic resonance imaging (MRI) measurements from breast tissue in healthy volunteers. METHODS: Using a dual-tuned bilateral 23Na/1H breast coil at 3-T MRI, high-resolution 23Na MRI three-dimensional cones sequences were used to quantify total sodium concentration (TSC) and fluid-attenuated sodium concentration (FASC). B1-corrected TSC and FASC maps were created. Two readers manually measured mean, minimum and maximum TSC and mean FASC values using two sampling methods: large regions of interest (LROIs) and small regions of interest (SROIs) encompassing fibroglandular tissue (FGT) and the highest signal area at the level of the nipple, respectively. The reproducibility of the measurements and correlations between density, age and FGT apparent diffusion coefficient (ADC) values were evaluatedss. RESULTS: Nine healthy volunteers were included. The inter-reader reproducibility of TSC and FASC using SROIs and LROIs was excellent (intraclass coefficient range 0.945-0.979, p < 0.001), except for the minimum TSC LROI measurements (p = 0.369). The mean/minimum LROI TSC and mean LROI FASC values were lower than the respective SROI values (p < 0.001); the maximum LROI TSC values were higher than the SROI TSC values (p = 0.009). TSC correlated inversely with age but not with FGT ADCs. The mean and maximum FGT TSC and FASC values were higher in dense breasts in comparison to non-dense breasts (p < 0.020). CONCLUSIONS: The chosen sampling method and the selected descriptive value affect the measured TSC and FASC values, although the inter-reader reproducibility of the measurements is in general excellent. RELEVANCE STATEMENT: 23Na MRI at 3 T allows the quantification of TSC and FASC sodium concentrations. The sodium measurements should be obtained consistently in a uniform manner. KEY POINTS: â¢ 23Na MRI allows the quantification of total and fluid-attenuated sodium concentrations (TSC/FASC). â¢ Sampling method (large/small region of interest) affects the TSC and FASC values. â¢ Dense breasts have higher TSC and FASC values than non-dense breasts. â¢ The inter-reader reproducibility of TSC and FASC measurements was, in general, excellent. â¢ The results suggest the importance of stratifying the sodium measurements protocol.

Subject(s)

Breast , Magnetic Resonance Imaging , Sodium , Humans , Female , Reproducibility of Results , Adult , Magnetic Resonance Imaging/methods , Breast/diagnostic imaging , Middle Aged , Sodium Isotopes , Healthy Volunteers , Observer Variation , Young Adult

16.

Reliability of glenoid measurements performed using Multiplanar Reconstruction (MPR) of Magnetic Resonance (MRI) in patients with shoulder instability.

Nizinski, Jan; Kaczmarek, Agata; Antonik, Bartosz; Rauhut, Sebastian; Tuczynski, Piotr; Jakubowski, Filip; Slawski, Julian; Stefaniak, Jakub; Lubiatowski, Przemyslaw.

Int Orthop ; 48(8): 2129-2136, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38833167

ABSTRACT

PURPOSE: Measurement of glenoid bone loss in the shoulder instability can be assessed by CT or MRI multiplanar imaging and is crucial for pre-operative planning. The aim of this study is to determine the intra and interobserver reliability of glenoid deficiency measurement using MRI multiplanar reconstruction with 2D assessment in the sagittal plane (MPR MRI). METHODS: We reviewed MRI images of 80 patients with anterior shoulder instability with Osirix software using MPR. Six observers with basic experience measured the glenoid, erosion edge length, and bone loss twice, with at least one-week interval between measurements. We calculated reliability and repeatability using the intra-class correlation coefficient (ICC) and minimal detectable change with 95% confidence (MDC95%). RESULTS: Intra and Inter-observer ICC and MDC95% for glenoid width and height were excellent (ICC 0,89-0,96). For erosion edge length and area of the glenoid were acceptable/good (ICC 0,61-0,89). Bone loss and Pico Index were associated with acceptable/good ICC (0,63 -0,86)) but poor MDC95% (45 - 84 %). Intra-observer reliability improved with time, while inter-observer remained unchanged. CONCLUSION: The MPR MRI measurement of the anterior glenoid lesion is very good tool for linear parameters. This method is not valid for Pico index measurement, as the area of bone loss is variable. The pace of learning is individual, therefore complex calculations based on MPR MRI are not resistant to low experience as opposed to true 3D CT.

Subject(s)

Joint Instability , Magnetic Resonance Imaging , Observer Variation , Shoulder Joint , Humans , Joint Instability/diagnostic imaging , Magnetic Resonance Imaging/methods , Reproducibility of Results , Shoulder Joint/diagnostic imaging , Shoulder Joint/pathology , Male , Female , Adult , Young Adult , Middle Aged , Adolescent , Glenoid Cavity/diagnostic imaging , Glenoid Cavity/pathology , Retrospective Studies , Image Processing, Computer-Assisted/methods

17.

Inter-software reliability and agreement for follicular and luteal morphometric and echotextural ultrasonographic parameters in beef cattle.

Pinzón-Osorio, César Augusto; Machado, Marco Alves; Camozzato, Julia Nobre Blank; Dos Santos Velho, Gabriella; Dalto, André Gustavo Cabrera; Rovani, Monique Tomazele; de Oliveira, Fernando Caetano; Bertolini, Marcelo.

Anim Reprod Sci ; 267: 107518, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38889613

ABSTRACT

This study aimed to compare the inter-software and inter-observer reliability and agreement for the assessment of follicular and luteal morphometry and echotexture parameters in beef crossbreed females (3/8 Bos taurus indicus and 5/8 Bos taurus taurus). B-mode and color Doppler ultrasonographic ovarian images were obtained at specific time points of estradiol-progesterone-based protocols for timed artificial insemination (TAI). Sonograms were analyzed by two observers using a licensed (IASP1) and an open access (IASP2) software package. A total of 292 snap-shot sonograms were analyzed for morphometric parameters and 504 for echotexture parameters. inter-software reliability was judged moderate to excellent (ICC or CCC=0.73-0.98), whereas inter-observer reliability for morphometric parameters was deemed good to excellent (ICC or CCC=0.82-0.98). A small percentage (up to 10.95â¯%) of measured parameters fell outside the limits of inter-software and inter-observer agreement. For echotexture parameters, inter-software reliability varied widely (ICC or CCC=0.16-0.95) based on the size of regions of interest (ROI), while inter-observer reliability ranged from moderate to excellent (ICC or CCC= 0.71-0.97). The highest inter-software reliability for pixel value and heterogeneity value was observed for the corpus luteum (ICCs=0.81-0.95; P>0.05), followed by the peripheral follicular antrum (ICCs=0.75-0.78; P<0.05). However, lower reliability was determined for the follicular wall (ICCs=0.08-0.33; P<0.0001) and perifollicular stroma (ICCs=0.16-0.46; P<0.05). In conclusion, both software packages showed high reproducibility for morphometric measurements, while echotexture measurements were more challenging to replicate based on ROI sizes. Caution is advised when selecting ROI sizes for echotexture measurements in bovine ovaries.

Subject(s)

Corpus Luteum , Ovarian Follicle , Software , Ultrasonography , Animals , Cattle/physiology , Female , Corpus Luteum/diagnostic imaging , Reproducibility of Results , Ovarian Follicle/diagnostic imaging , Ultrasonography/veterinary , Ultrasonography/methods , Observer Variation

18.

Inter-observer effects in needle reconstruction for temporary prostate brachytherapy: Dosimetric implications and adaptive CBCT-TRUS registration solutions.

Karius, Andre; Kreppner, Stephan; Strnad, Vratislav; Schweizer, Claudia; Lotter, Michael; Fietkau, Rainer; Bert, Christoph.

Brachytherapy ; 23(4): 421-432, 2024.

Article in English | MEDLINE | ID: mdl-38845268

ABSTRACT

PURPOSE: To investigate geometric and dosimetric inter-observer variability in needle reconstruction for temporary prostate brachytherapy. To assess the potential of registrations between transrectal ultrasound (TRUS) and cone-beam computed tomography (CBCT) to support implant reconstructions. METHODS AND MATERIALS: The needles implanted in 28 patients were reconstructed on TRUS by three physicists. Corresponding geometric deviations and associated dosimetric variations to prostate and organs at risk (urethra, bladder, rectum) were analyzed. To account for the found inter-observer variability, various approaches (template-based, probe-based, marker-based) for registrations of CBCT to TRUS were investigated regarding the respective needle transfer accuracy in a phantom study. Three patient cases were examined to assess registration accuracy in-vivo. RESULTS: Geometric inter-observer deviations >1 mm and >3 mm were found for 34.9% and 3.5% of all needles, respectively. Prostate dose coverage (changes up to 7.2%) and urethra dose (partly exceeding given dose constraints) were most affected by associated dosimetric changes. Marker-based and probe-based registrations resulted in the phantom study in high mean needle transfer accuracies of 0.73 mm and 0.12 mm, respectively. In the patient cases, the marker-based approach was the superior technique for CBCT-TRUS fusions. CONCLUSION: Inter-observer variability in needle reconstruction can substantially affect dosimetry for individual patients. Especially marker-based CBCT-TRUS registrations can help to ensure accurate reconstructions for improved treatment planning.

Subject(s)

Brachytherapy , Cone-Beam Computed Tomography , Needles , Observer Variation , Phantoms, Imaging , Prostatic Neoplasms , Radiotherapy Dosage , Humans , Male , Prostatic Neoplasms/radiotherapy , Prostatic Neoplasms/diagnostic imaging , Brachytherapy/methods , Cone-Beam Computed Tomography/methods , Radiotherapy Planning, Computer-Assisted/methods , Ultrasonography/methods , Prostate/diagnostic imaging , Organs at Risk/radiation effects , Radiotherapy, Image-Guided/methods , Rectum/diagnostic imaging

19.

The band count imprecision - a Croatian multicentric pilot study.

Radisic Biljak, Vanja; Juresa, Visnja; Vidranski, Valentina; Vuga, Ivana; Tomic, Franciska; Smaic, Fran; Horvat, Martina; Kresic, Branka; Simac, Brankica; Lapic, Ivana.

Biochem Med (Zagreb) ; 34(2): 020803, 2024 Jun 15.

Article in English | MEDLINE | ID: mdl-38882588

ABSTRACT

Introduction: Due to high inter-observer variability the 2015 International Council for Standardization in Haematology (ICSH) recommendations state to count band neutrophils as segmented neutrophils in the white blood cell (WBC) differential. However, the inclusion of bands as a separate cell entity within the WBC differential is still widely used in hematology laboratories in Croatia. The aim of this multicentric study was to assess the degree of inter-observer variability in enumerating band neutrophils within the WBC differential among Croatian laboratories. Materials and methods: Seven large Croatian hospital laboratories from different parts of the country participated in the study. In each of 7 participating laboratories, one blood smear, that was flagged by the analyzer as possibly having bands, was evaluated by all personnel participating in the analysis of hematology samples. Between-observer manual smear reproducibility was expressed as coefficient of variation (CV) and calculated using the following formula: CV (%) = (standard deviation (SD)/mean value) x 100%. Results: The CVs (%) and relative band neutrophil counts in participating laboratories were as follows: 15.4% (16-24), 19.2% (16-32), 19.5% (17-40), 21.1% (17-44), 35.0% (8-26), 51.9% (3-29), and remarkably high 62.4% (12-59). For segmented neutrophils CVs were lower, ranging from 7.4% to 32.2%. The CVs did not correlate with the number of staff members in each hospital (P = 0.293). Conclusions: This study revealed very high variability in enumerating band neutrophil count in the blood smear differential among all participants, thus prompting a need for action on a national level.

Subject(s)

Neutrophils , Humans , Croatia , Pilot Projects , Leukocyte Count , Neutrophils/cytology , Observer Variation , Reproducibility of Results

20.

Intra- and inter-operator reliability of measuring compressive stiffness of the patellar tendon in volleyball players using a handheld digital palpation device.

van Dam, Lotte; Terink, Rieneke; van den Akker-Scheek, Inge; Zwerver, Johannes.

PLoS One ; 19(6): e0304743, 2024.

Article in English | MEDLINE | ID: mdl-38917106

ABSTRACT

This observational study aimed to evaluate the intra- and inter-operator reliability of a digital palpation device in measuring compressive stiffness of the patellar tendon at different knee angles in talent and elite volleyball players. Second aim was to examine differences in reliability when measuring at different knee angles, between dominant and non-dominant knees, between sexes, and with age. Two operators measured stiffness at the midpoint of the patellar tendon in 45 Dutch volleyball players at 0°, 45° and 90° knee flexion, on both the dominant and non-dominant side. We found excellent intra-operator reliability (ICC>0.979). For inter-operator reliability, significant differences were found in stiffness measured between operators (p<0.007). The coefficient of variance significantly decreased with increasing knee flexion (2.27% at 0°, 1.65% at 45° and 1.20% at 90°, p<0.001). In conclusion, the device appeared to be reliable when measuring compressive stiffness of the patellar tendon in elite volleyball players, especially at 90° knee flexion. Inter-operator reliability appeared to be questionable. More standardized positioning and measurement protocols seem necessary.

Subject(s)

Palpation , Patellar Ligament , Volleyball , Humans , Volleyball/physiology , Male , Female , Patellar Ligament/physiology , Palpation/instrumentation , Palpation/methods , Reproducibility of Results , Young Adult , Adult , Range of Motion, Articular/physiology , Knee Joint/physiology , Adolescent , Biomechanical Phenomena , Observer Variation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL