Pesquisa | Portal Regional da BVS (teste)

1.

Reliability of the Test of Gross Motor Development Third Edition Among Children with Developmental Coordination Disorder.

Roczniak, Laine; Jutras, Mylène; Lévesque, Caroline; Fortin, Carole.

Phys Occup Ther Pediatr ; : 1-14, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39007754

RESUMO

AIM: The Test of Gross Motor Development Third Edition (TGMD-3) is used to assess the development of fundamental movement skills in children from 3 to 10 years old. This study aimed to evaluate the intra-rater, inter-rater, and test-retest reliability and to determine the minimal detectable change (MDC) value of the TGMD-3 in children with developmental coordination disorder (DCD). METHODS: The TGMD-3 was administered to 20 children with DCD. The child's fundamental movement skills were recorded using a digital video camera. Reliability was assessed at two occasions by three raters using the generalizability theory. RESULTS: The TGMD-3 demonstrates good inter-rater reliability for the locomotor skills subscale, the ball skills subscale, and the total score (φ = 0.77 - 0.91), while the intra-rater reliability was even higher (φ = 0.94 - 0.97). Test-retest reliability was also shown to be good (φ = 0.79-0.93). The MDC95 was determined to be 10 points. CONCLUSION: This study provides evidence that the TGMD-3 is a reliable test when used to evaluate fundamental movement skills in children with DCD and suggests that an increase of 10 points represents a significant change in the motor function of a child with DCD.

2.

Consistency between the ACGIH TLV for hand activity and proposed action levels for wrist velocity and forearm muscular load based on objective measurements: an example from the assembly industry.

Dahlqvist, Camilla; Arvidsson, Inger; Löfqvist, Lotta; Gremark Simonsen, Jenny.

Int J Occup Saf Ergon ; : 1-9, 2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-38961651

RESUMO

Objectives. This study aimed to investigate the consistency between results of the American Conference for Governmental Occupational Hygienists (ACGIH) threshold limit value (TLV) for hand activity and proposed action levels of objective measurements in risk assessments of work-related musculoskeletal disorders. Methods. Wrist velocities and forearm muscular load were measured for 11 assemblers during one working day. Simultaneously, each assembler's hand activity level (HAL) during three sub-cycles was rated twice on two separate occasions by two experts, using a HAL scale. Arm/hand exertion was also rated by the assemblers themselves using a Borg scale. In total, 66 sub-cycles were assessed and assigned to three exposure categories: A) below ACGIH action limit (AL) (green); B) between AL and TLV (yellow); and C) above TLV (red). The median wrist velocity and the 90th percentile of forearm muscular load obtained from the objective measurements corresponding to the sub-cycles were calculated and assigned to two exposure categories: A) below or C) above the proposed action level. Results. The agreement between ACGIH TLV for hand activity and the proposed action level for wrist velocity was 87%. Conclusions. The proposed action level for wrist velocity is highly consistent with the TLV. Additional studies are needed to confirm the results.

3.

Inter-rater reliability and clinical relevance of subjective and objective interpretation of videofluoroscopy findings.

Kuuskoski, Jonna; Vanhatalo, Jaakko; Hirvonen, Jussi; Rekola, Jami; Aaltonen, Leena-Maija; Järvenpää, Pia.

Laryngoscope Investig Otolaryngol ; 9(4): e1298, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38974605

RESUMO

Background: Dysphagia is commonly evaluated using videofluoroscopy (VFS). As its ratings are usually subjective normal-abnormal ratings, objective measurements have been developed. We compared the inter-rater reliability of the usual VFS ratings to the objective measurement VFS ratings and evaluated their clinical relevance. Methods: Two blinded raters analyzed the subjective normal-abnormal ratings of 77 patients' VFS. Two other blinded raters analyzed the objective measurements of pharyngeal aerated area with bolus held in the oral cavity (PAhold), the pharyngeal area of residual bolus during swallowing (PAmax), the pharyngeal constriction ratio (PCR), the maximum pharyngoesophageal segment opening (PESmax), pharyngoesophageal segment opening duration (POD), airway closure duration (ACD), and total pharyngeal transit time (TPT). We evaluated the inter-rater agreement in the subjective ratings and the objective measurements. Clinical utility analysis compared the measurements with the VFS findings of pharyngeal phase abnormality, penetration/aspiration, and cricopharyngeal relaxation. Results: In the pharyngeal findings, the subjective analysis inter-rater agreement was mainly moderate to strong. The strongest agreements were on the pharyngeal residues and penetration/aspiration findings. The objective measurements had fair to good inter-rater agreement. Clinical utility analysis found statistically significant connections between TPT and pharyngeal phase abnormality, normal PCR and lack of penetration/aspiration, and normal PESmax and normal cricopharyngeal relaxation. Conclusions: The subjective analysis had moderate to strong inter-rater agreement in the pharyngeal VFS findings, especially concerning pharyngeal residues and penetration/aspiration detection, reflecting the efficacy and safety of swallowing. The objective measurements had fair to good inter-observer reproducibility and could thus improve the reliability of VFS diagnostics. Level of evidence: 4.

4.

Structured peer review: pilot results from 23 Elsevier journals.

Malicki, Mario; Mehmani, Bahar.

PeerJ ; 12: e17514, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38948202

RESUMO

Background: Reviewers rarely comment on the same aspects of a manuscript, making it difficult to properly assess manuscripts' quality and the quality of the peer review process. The goal of this pilot study was to evaluate structured peer review implementation by: 1) exploring whether and how reviewers answered structured peer review questions, 2) analysing reviewer agreement, 3) comparing that agreement to agreement before implementation of structured peer review, and 4) further enhancing the piloted set of structured peer review questions. Methods: Structured peer review consisting of nine questions was piloted in August 2022 in 220 Elsevier journals. We randomly selected 10% of these journals across all fields and IF quartiles and included manuscripts that received two review reports in the first 2 months of the pilot, leaving us with 107 manuscripts belonging to 23 journals. Eight questions had open-ended fields, while the ninth question (on language editing) had only a yes/no option. The reviews could also leave Comments-to-Author and Comments-to-Editor. Answers were independently analysed by two raters, using qualitative methods. Results: Almost all the reviewers (n = 196, 92%) provided answers to all questions even though these questions were not mandatory in the system. The longest answer (Md 27 words, IQR 11 to 68) was for reporting methods with sufficient details for replicability or reproducibility. The reviewers had the highest (partial) agreement (of 72%) for assessing the flow and structure of the manuscript, and the lowest (of 53%) for assessing whether interpretation of the results was supported by data, and for assessing whether the statistical analyses were appropriate and reported in sufficient detail (52%). Two thirds of the reviewers (n = 145, 68%) filled out the Comments-to-Author section, of which 105 (49%) resembled traditional peer review reports. These reports contained a Md of 4 (IQR 3 to 5) topics covered by the structured questions. Absolute agreement regarding final recommendations (exact match of recommendation choice) was 41%, which was higher than what those journals had in the period from 2019 to 2021 (31% agreement, P = 0.0275). Conclusions: Our preliminary results indicate that reviewers successfully adapted to the new review format, and that they covered more topics than in their traditional reports. Individual question analysis indicated the greatest disagreement regarding the interpretation of the results and the conducting and the reporting of statistical analyses. While structured peer review did lead to improvement in reviewer final recommendation agreements, this was not a randomized trial, and further studies should be performed to corroborate this. Further research is also needed to determine whether structured peer review leads to greater knowledge transfer or better improvement of manuscripts.

Assuntos

Revisão da Pesquisa por Pares , Publicações Periódicas como Assunto , Projetos Piloto , Revisão da Pesquisa por Pares/normas , Publicações Periódicas como Assunto/normas , Humanos , Políticas Editoriais , Revisão por Pares/métodos

5.

Unsupervised Segmentation of Knee Bone Marrow Edema-like Lesions Using Conditional Generative Models.

Yu, Andrew Seohwan; Yang, Mingrui; Lartey, Richard; Holden, William; Ok, Ahmet Hakan; Khan, Sameed; Kim, Jeehun; Winalski, Carl; Subhas, Naveen; Chaudhary, Vipin; Li, Xiaojuan.

Bioengineering (Basel) ; 11(6)2024 May 22.

Artigo em Inglês | MEDLINE | ID: mdl-38927762

RESUMO

Bone marrow edema-like lesions (BMEL) in the knee have been linked to the symptoms and progression of osteoarthritis (OA), a highly prevalent disease with profound public health implications. Manual and semi-automatic segmentations of BMELs in magnetic resonance images (MRI) have been used to quantify the significance of BMELs. However, their utilization is hampered by the labor-intensive and time-consuming nature of the process as well as by annotator bias, especially since BMELs exhibit various sizes and irregular shapes with diffuse signal that lead to poor intra- and inter-rater reliability. In this study, we propose a novel unsupervised method for fully automated segmentation of BMELs that leverages conditional diffusion models, multiple MRI sequences that have different contrast of BMELs, and anomaly detection that do not rely on costly and error-prone annotations. We also analyze BMEL segmentation annotations from multiple experts, reporting intra-/inter-rater variability and setting better benchmarks for BMEL segmentation performance.

6.

Radiology-Pathology Concordance and Prognostication of Nodal Features in pN+ Oral Cavity Cancer.

Duguet-Armand, Marie; Su, Jie; O'Sullivan, Brian; de Almeida, John; Hosni, Ali; Weinreb, Ilan; Perez-Ordonez, Bayardo; Smith, Stephen; Witterick, Ian; Yao, Christopher; Goldstein, David; Hope, Andrew; Hahn, Ezra; Waldron, John; Ringash, Jolie; Spreafico, Anna; Yu, Eugene; Huang, Shao Hui.

Laryngoscope ; 2024 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-38874287

RESUMO

BACKGROUND AND PURPOSE: The aims of our study are to evaluate the diagnostic performance and prognostic value of radiological lymph node (LN) characteristics in pN+ oral cavity squamous carcinoma (OSCC). MATERIALS AND METHODS: pN+ OSCC treated between 2012 and 2020 were included. Preoperative imaging was reviewed by a single radiologist blinded to pathologic findings for the following nodal features: imaging-positive LN (iN+), laterality and total number, and image-identified extranodal extension (iENE). The sensitivity of iN+ for pN+ was calculated. The diagnostic performance of other nodal features was evaluated in the iN+ subgroup. The association of radiologic nodal features with overall survival (OS) was evaluated. Inter-rater kappa for radiologic nodal features was assessed in 100 randomly selected cases. RESULTS: Of 406 pN+ OSCC, 288 were iN+. The sensitivity of iN+ for pN+ was 71% overall, and improved to 89% for pN+ LN >1.5 cm. Within iN+, sensitivity/specificity for LN size (>3 cm), total LN number (>4), and ENE were 0.44/0.95, 0.57/0.84, and 0.27/0.96, respectively. Sensitivity of iENE was higher in the subset, with major (>2 mm) versus minor (≤2 mm) pENE (43% vs. 13%, p = 0.001). Reduced OS was observed in iN+ versus iN- (p = 0.006), iENE+ versus iENE- (p = 0.004), LN size >3 versus ≤3 cm (p < 0.001), and higher LN number (p < 0.001). Inter-rater kappa for iN+, laterality, total LN number, and presence of iENE were 0.71, 0.57, 0.78, and 0.69, respectively. CONCLUSION: Our study shows that despite modest sensitivity of most radiological nodal features, the specificity of image-identified nodal features is high and their prognostic values are retained in pN+ OSCC. LEVEL OF EVIDENCE: Level 3 (retrospective review comparing cases and controls) Laryngoscope, 2024.

7.

Evaluating the Reliability of MyotonPro in Assessing Muscle Properties: A Systematic Review of Diagnostic Test Accuracy.

Lettner, Jonathan; Królikowska, Aleksandra; Ramadanov, Nikolai; Oleksy, Lukasz; Hakam, Hassan Tarek; Becker, Roland; Prill, Robert.

Medicina (Kaunas) ; 60(6)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38929468

RESUMO

Background and Objectives: Muscle properties are critical for performance and injury risk, with changes occurring due to physical exertion, aging, and neurological conditions. The MyotonPro device offers a non-invasive method to comprehensively assess muscle biomechanical properties. This systematic review evaluates the reliability of MyotonPro across various muscles for diagnostic purposes. Materials and Methods: Following PRISMA guidelines, a comprehensive literature search was conducted in Medline (PubMed), Ovid (Med), Epistemonikos, Embase, Cochrane Library, Clinical trials.gov, and the WHO International Clinical Trials platform. Studies assessing the reliability of MyotonPro across different muscles were included. A methodological quality assessment was performed using established tools, and reviewers independently conducted data extraction. Statistical analysis involved summarizing intra-rater and inter-rater reliability measures across muscles. Results: A total of 48 studies assessing 31 muscles were included in the systematic review. The intra-rater and inter-rater reliability were consistently high for parameters such as frequency and stiffness in muscles of the lower and upper extremities, as well as other muscle groups. Despite methodological heterogeneity and limited data on specific parameters, MyotonPro demonstrated promising reliability for diagnostic purposes across diverse patient populations. Conclusions: The findings suggest the potential of MyotonPro in clinical assessments for accurate diagnosis, treatment planning, and monitoring of muscle properties. Further research is needed to address limitations and enhance the applicability of MyotonPro in clinical practice. Reliable muscle assessments are crucial for optimizing treatment outcomes and improving patient care in various healthcare settings.

Assuntos

Músculo Esquelético , Humanos , Reprodutibilidade dos Testes , Músculo Esquelético/fisiologia , Músculo Esquelético/fisiopatologia , Testes Diagnósticos de Rotina/normas , Testes Diagnósticos de Rotina/métodos

8.

Evaluation of a Semi-Automated Wound-Halving Algorithm for Split-Wound Design Studies: A Step towards Enhanced Wound-Healing Assessment.

Georg, Paul Julius; Schmid, Meret Emily; Zahia, Sofia; Probst, Sebastian; Cazzaniga, Simone; Hunger, Robert; Bossart, Simon.

J Clin Med ; 13(12)2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38930128

RESUMO

Background: Chronic leg ulcers present a global challenge in healthcare, necessitating precise wound measurement for effective treatment evaluation. This study is the first to validate the "split-wound design" approach for wound studies using objective measures. We further improved this relatively new approach and combined it with a semi-automated wound measurement algorithm. Method: The algorithm is capable of plotting an objective halving line that is calculated by splitting the bounding box of the wound surface along the longest side. To evaluate this algorithm, we compared the accuracy of the subjective wound halving of manual operators of different backgrounds with the algorithm-generated halving line and the ground truth, in two separate rounds. Results: The median absolute deviation (MAD) from the ground truth of the manual wound halving was 2% and 3% in the first and second round, respectively. On the other hand, the algorithm-generated halving line showed a significantly lower deviation from the ground truth (MAD = 0.3%, p < 0.001). Conclusions: The data suggest that this wound-halving algorithm is suitable and reliable for conducting wound studies. This innovative combination of a semi-automated algorithm paired with a unique study design offers several advantages, including reduced patient recruitment needs, accelerated study planning, and cost savings, thereby expediting evidence generation in the field of wound care. Our findings highlight a promising path forward for improving wound research and clinical practice.

9.

Inter-rater reliability of ACS-NSQIP colorectal procedure coding in Canada.

Xiong, Yingqi; Spence, Richard T; Hirsch, Greg; Walsh, Mark J; Neumann, Katerina.

Am J Surg ; : 115787, 2024 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-38944624

RESUMO

BACKGROUND: The American College of Surgeons National Surgical Quality Improvement Project (ACS-NSQIP) uses Current Procedural Terminology (CPT) codes for risk-adjusted calculations. This study evaluates the inter-rater reliability of coding colorectal resections across Canada by ACS-NSQIP surgical clinical nurse reviewers (SCNR) and its impact on risk predictions. METHODS: SCNRs in Canada were asked to code simulated operative reports. Percent agreement and free-marginal kappa correlation were calculated. The ACS-NSQIP risk calculator was utilized to illustrate its impact on risk prediction. RESULTS: Responses from 44 of 150 (29.3 â%) SCNRs revealed 3 to 6 different codes chosen per case, with agreement ranging from 6.7 â% to 62.3 â%. Free-marginal kappa correlation ranged from moderate agreement (0.53) to high disagreement (-0.17). ACS-NSQIP risk calculator predicted large absolute differences in risk for serious complications (0.2 â%-13.7 â%) and mortality (0.2 â%-6.3 â%). CONCLUSION: This study demonstrated low inter-rater reliability in coding ACS-NSQIP colorectal procedures in Canada among SCNRs, impacting risk predictions.

10.

Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring.

Steinbach, Thomas J; Tokarz, Debra A; Co, Caroll A; Harris, Shawn F; McBride, Sandra J; Shockley, Keith R; Lokhande, Avinash; Srivastava, Gargi; Ugalmugle, Rajesh; Kazi, Arshad; Singletary, Emily; Cesta, Mark F; Thomas, Heath C; Chen, Vivian S; Hobbie, Kristen; Crabbs, Torrie A.

Toxicol Pathol ; : 1926233241259998, 2024 Jun 22.

Artigo em Inglês | MEDLINE | ID: mdl-38907685

RESUMO

We previously developed a computer-assisted image analysis algorithm to detect and quantify the microscopic features of rodent progressive cardiomyopathy (PCM) in rat heart histologic sections and validated the results with a panel of five veterinary toxicologic pathologists using a multinomial logistic model. In this study, we assessed both the inter-rater and intra-rater agreement of the pathologists and compared pathologists' ratings to the artificial intelligence (AI)-predicted scores. Pathologists and the AI algorithm were presented with 500 slides of rodent heart. They quantified the amount of cardiomyopathy in each slide. A total of 200 of these slides were novel to this study, whereas 100 slides were intentionally selected for repetition from the previous study. After a washout period of more than six months, the repeated slides were examined to assess intra-rater agreement among pathologists. We found the intra-rater agreement to be substantial, with weighted Cohen's kappa values ranging from k = 0.64 to 0.80. Intra-rater variability is not a concern for the deterministic AI. The inter-rater agreement across pathologists was moderate (Cohen's kappa k = 0.56). These results demonstrate the utility of AI algorithms as a tool for pathologists to increase sensitivity and specificity for the histopathologic assessment of the heart in toxicology studies.

11.

Effect of measurement procedure errors on assessing lung fluid via remote dielectric sensing system.

Chen, Wei-Ting; Tsai, Yi-Ju; Chou, Hsiao-Chen; Pu, Yi-Chih; Chien, Jung-Yien; Huang, Chun-Ta.

Sci Rep ; 14(1): 14020, 2024 06 18.

Artigo em Inglês | MEDLINE | ID: mdl-38890408

RESUMO

The study assessed the impact of procedural errors on the remote dielectric sensing system (ReDS), a non-invasive lung fluid assessment technology, in an Asian cohort. Healthy volunteers underwent ReDS measurements following manufacturer's instructions, with two consecutive measurements one minute apart. A subset of 20 participants had modified procedure settings. Reliability was measured using intraclass correlation coefficient (ICC). The study included 86 healthy volunteers, and all ReDS measurements fell within the recommended normal range. The intra-rater reliability of ReDS measurements was excellent, with an ICC of 0.968. Among the subset of 20 subjects, deviations in height and weight did not significantly affect ReDS values. However, deviations in chest size by ± 3 cm had a noticeable impact on ReDS measures, and incorrect station selection led to fluctuations in ReDS readings. In conclusion, the ReDS system demonstrated excellent intra-rater reliability and applicability in an Asian cohort. Procedural errors, such as chest size measurement and station selection, significantly influenced ReDS measurements. Adherence to standardized operating procedures is crucial to ensure accurate and consistent results. These findings highlight the importance of adherence to manufacturer instructions when utilizing ReDS for lung fluid assessment, thereby enhancing its reliability and clinical applicability.

Assuntos

Pulmão , Humanos , Masculino , Feminino , Adulto , Pulmão/fisiologia , Reprodutibilidade dos Testes , Tecnologia de Sensoriamento Remoto/métodos , Voluntários Saudáveis , Adulto Jovem , Pessoa de Meia-Idade , Líquidos Corporais , Impedância Elétrica

12.

Reliability of single-lead electrocardiogram interpretation to detect atrial fibrillation: insights from the SAFER feasibility study.

Hibbitt, Katie; Brimicombe, James; Cowie, Martin R; Dymond, Andrew; Freedman, Ben; Griffin, Simon J; Hobbs, F D R Ichard; Lindén, Hannah Clair; Lip, Gregory Y H; Mant, Jonathan; McManus, Richard J; Pandiaraja, Madhumitha; Williams, Kate; Charlton, Peter H.

Europace ; 26(7)2024 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-38941497

RESUMO

AIMS: Single-lead electrocardiograms (ECGs) can be recorded using widely available devices such as smartwatches and handheld ECG recorders. Such devices have been approved for atrial fibrillation (AF) detection. However, little evidence exists on the reliability of single-lead ECG interpretation. We aimed to assess the level of agreement on detection of AF by independent cardiologists interpreting single-lead ECGs and to identify factors influencing agreement. METHODS AND RESULTS: In a population-based AF screening study, adults aged ≥65 years old recorded four single-lead ECGs per day for 1-4 weeks using a handheld ECG recorder. Electrocardiograms showing signs of possible AF were identified by a nurse, aided by an automated algorithm. These were reviewed by two independent cardiologists who assigned participant- and ECG-level diagnoses. Inter-rater reliability of AF diagnosis was calculated using linear weighted Cohen's kappa (κw). Out of 2141 participants and 162 515 ECGs, only 1843 ECGs from 185 participants were reviewed by both cardiologists. Agreement was moderate: κw = 0.48 (95% confidence interval, 0.37-0.58) at participant level and κw = 0.58 (0.53-0.62) at ECG level. At participant level, agreement was associated with the number of adequate-quality ECGs recorded, with higher agreement in participants who recorded at least 67 adequate-quality ECGs. At ECG level, agreement was associated with ECG quality and whether ECGs exhibited algorithm-identified possible AF. CONCLUSION: Inter-rater reliability of AF diagnosis from single-lead ECGs was found to be moderate in older adults. Strategies to improve reliability might include participant and cardiologist training and designing AF detection programmes to obtain sufficient ECGs for reliable diagnoses.

Assuntos

Algoritmos , Fibrilação Atrial , Eletrocardiografia , Estudos de Viabilidade , Variações Dependentes do Observador , Humanos , Fibrilação Atrial/diagnóstico , Fibrilação Atrial/fisiopatologia , Idoso , Reprodutibilidade dos Testes , Feminino , Masculino , Eletrocardiografia/instrumentação , Eletrocardiografia/métodos , Valor Preditivo dos Testes , Idoso de 80 Anos ou mais , Processamento de Sinais Assistido por Computador , Frequência Cardíaca

13.

Surgeon assessment of significant rectal polyps using white light endoscopy alone and in comparison to fluorescence-augmented AI lesion classification.

Hardy, Niall P; Moynihan, Alice; Dalli, Jeffrey; Epperlein, Jonathan P; McEntee, Philip D; Boland, Patrick A; Neary, Peter M; Cahill, Ronan A.

Langenbecks Arch Surg ; 409(1): 170, 2024 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-38822883

RESUMO

PURPOSE: Perioperative decision making for large (> 2 cm) rectal polyps with ambiguous features is complex. The most common intraprocedural assessment is clinician judgement alone while radiological and endoscopic biopsy can provide periprocedural detail. Fluorescence-augmented machine learning (FA-ML) methods may optimise local treatment strategy. METHODS: Surgeons of varying grades, all performing colonoscopies independently, were asked to visually judge endoscopic videos of large benign and early-stage malignant (potentially suitable for local excision) rectal lesions on an interactive video platform (Mindstamp) with results compared with and between final pathology, radiology and a novel FA-ML classifier. Statistical analyses of data used Fleiss Multi-rater Kappa scoring, Spearman Coefficient and Frequency tables. RESULTS: Thirty-two surgeons judged 14 ambiguous polyp videos (7 benign, 7 malignant). In all cancers, initial endoscopic biopsy had yielded false-negative results. Five of each lesion type had had a pre-excision MRI with a 60% false-positive malignancy prediction in benign lesions and a 60% over-staging and 40% equivocal rate in cancers. Average clinical visual cancer judgement accuracy was 49% (with only 'fair' inter-rater agreement), many reporting uncertainty and higher reported decision confidence did not correspond to higher accuracy. This compared to 86% ML accuracy. Size was misjudged visually by a mean of 20% with polyp size underestimated in 4/6 and overestimated in 2/6. Subjective narratives regarding decision-making requested for 7/14 lesions revealed wide rationale variation between participants. CONCLUSION: Current available clinical means of ambiguous rectal lesion assessment is suboptimal with wide inter-observer variation. Fluorescence based AI augmentation may advance this field via objective, explainable ML methods.

Assuntos

Colonoscopia , Neoplasias Retais , Humanos , Neoplasias Retais/patologia , Neoplasias Retais/cirurgia , Neoplasias Retais/diagnóstico por imagem , Pólipos Intestinais/patologia , Pólipos Intestinais/cirurgia , Aprendizado de Máquina , Masculino , Fluorescência , Feminino , Variações Dependentes do Observador

14.

Low Inter-Rater Reliability and Reproducibility of Neck Reflex/"Adler-Langer" Points in Neural Therapy Diagnostics but Increased Pressure Pain Threshold after Therapy: Results of a Randomized Controlled Observer-Blind Trial.

Choi, Kyung-Eun; Grünert, Jan; Werner, Marc; Cramer, Holger; Anheyer, Dennis; Dobos, Gustav; Saha, Felix J.

Complement Med Res ; : 1-8, 2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38744266

RESUMO

BACKGROUND: Neck reflex points or Adler-Langer points are commonly used in neural therapy to detect so-called interference fields. Chronic irritations or inflammations in the sinuses, teeth, tonsils, or ears are supposed to induce tension and tenderness of the soft tissues and short muscles in the upper cervical spine. The individual treatment strategy is based on the results of diagnostic Adler-Langer point palpation. This study investigated the inter- and intra-rater reliability and explored treatment effects. METHODS: We performed a randomized controlled trial with 104 inpatients (80.8% female, 51.8 ± 12.74 years) of a German department for internal and integrative medicine. Patients were randomized to individual neural therapy according to the pathological findings (n = 48) or no treatment (n = 56). In each patient, three experienced raters (20-45 years of experience in neural therapy) and two novice raters (medical students) rated Adler-Langer points rigidity on a standardized rating scale ("strong," "weak," "none"). The patients independently evaluated the tenderness on palpation of the eight points using the same scale. Pressure pain thresholds were assessed at the eight Adler-Langer points. All patients were retested after 30 min. The five raters were blinded to treatment allocation and assessments of the other raters. Video recordings were obtained to assess the consistency of the areas tested by the different raters. RESULTS: Agreement between patients and raters (Cohen's kappa = 0.161-0.400) and inter-rater reliability were low (Fleiss kappa = 0.132-0.150). Moreover, the individual agreement (pre-post comparisons in untreated patients) was similarly low even in experienced raters (Cohen's kappa = 0.099-0.173). Video documentation suggests that raters do not place their fingers in the correct segments (percentage of correct position: 42.0-60.6%). Pressure pain thresholds at five of the eight Adler-Langer points showed significant changes after treatment compared to none in the control group. CONCLUSION: Under this artificial experimental setting, this method of Adler-Langer point palpation has not proven to be a reliable diagnostic tool. But it could be shown that, as claimed by the method, the tenderness in five of eight Adler-Langer points decreased after neural therapy.

15.

Evaluation of the departmental inter-rater reliability when scoring thyroid nodules according to the British Thyroid Association Ultrasound-classification model: Is there significant disagreement?

Rtam, Nabil.

Ultrasound ; 32(2): 76-84, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38694831

RESUMO

Introduction: The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2-5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author's department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Methods: Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss' kappa and Gwet-AC1. The operators' accuracy was measured against the most experienced operator in the department using Cohen's kappa and percentage agreement. Results: Fair agreement (Fleiss' K = 0.21) was obtained between the participants when U-scoring (U2-5). Fair-to-moderate agreement was noted between sonographers (K = 0.40). Significant variability was demonstrated between radiologists (p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists' AC1 = 0.34, sonographers' AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity (K = 0.29), composition (K = 0.33), shape (K = 0.58), margin (K = 0.45), halo (K = 0.34) and vascularity (K = 0.44). Accuracy reached fair agreement (mean Cohen's K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. Conclusion: No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.

16.

A qualitative investigation of the Montgomery-Åsberg depression rating scale: discrepancies in rater perceptions and data trends in remote assessments of rapid-acting antidepressants in treatment resistant depression.

Capodilupo, Gianna; Blattner, Raymond; Must, Anita; Navarro, Silvia Gamazo; Opler, Mark.

Front Psychiatry ; 15: 1289630, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38751415

RESUMO

Introduction: Despite the development of many successful pharmaceutical interventions, a significant subset of patients experience treatment-resistant depression (TRD). Ketamine and its derivatives constitute a novel therapeutic approach to treat TRD; however, standard tools, such as the Montgomery-Åsberg Depression Rating Scale (MADRS) are still being used to measure symptoms and track changes. Methods: The aim of this study was to review item-level differences between rate of data change (MADRS score) and rater-weighted perception of the most useful items for assessing change in symptoms while remotely conducting the 10-item version of the MADRS in TRD in a clinical trial of rapid-acting antidepressants. Two studies of rapid-acting antidepressants in the treatment of TRD were used to identify item-scoring trends when MADRS is administered remotely and repeatedly (733 subjects across 10 visits). Scoring trends were evaluated in tandem to a rater survey completed by 75 raters. This was completed to gain insight on MADRS items' perceived level of helpfulness when assessing change of symptoms in rapid-acting antidepressant trials. Results: MADRS items 'Reduced sleep', 'Apparent sadness', and 'Pessimistic thoughts' were found to have the greatest average data change by visit, while raters ranked 'Reported sadness', 'Lassitude' and 'Apparent sadness' as the most helpful items when assessing symptom change. Discussion: The diversion between rate of data-change ranking and rater perception of helpfulness could be related to difficulty in assessing specific items, to the novel treatment itself, and/or to the sensitivity to symptom change to which raters are accustomed in traditional antidepressant treatments.

17.

Evaluation of data collection bias of third molar stages of mineralisation for age estimation in the living.

de Oliveira Santos, Inês; Baptista, Isabel Poiares; da Silva, Ricardo Henrique Alves; Cunha, Eugénia.

Forensic Sci Res ; 9(2): owae004, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38765699

RESUMO

Age assessment of the living is a fundamental procedure in the process of human identification, in order to guarantee fair treatment of individuals, which has ethical, civil, legal, and medical repercussions. The careful selection of the appropriate methods requires evaluation of several parameters: accuracy, precision of the method, as well as its reproducibility. The approach proposed by Mincer et al. adapted from Demirjian et al. exploring third molar mineralisation, is one of the most frequently considered for age estimation of the living. Thus, this work aims to assess potential bias in the data collection when applying the classification stages for dental mineralisation adapted by Mincer et al. A total of 102 orthopantomographs, of clinical origin, belonging to individuals aged between 12 and 25 years ([Formula: see text] = 20.12 years, SD = 3.49 years; 65 females, 37 males, all of Portuguese nationality) were included and a retrospective analysis performed by five observers with different levels of experience (high, average, and basic). The performance and agreement between five observers were evaluated using Weighted Cohen's Kappa and the Intraclass Correlation Coefficient. To access the influence of impaction on third molar classification, variables were tested using ordinal logistic regression Generalised Linear Model. It was observed that there were variations in the number of teeth identified among the observers, but the agreement levels ranged from moderate to substantial (0.4-0.8). Upon closer examination of the results, it was observed that although there were discernible differences between highly experienced observers and those with less experience, the gap was not as significant as initially hypothesised, and a greater disparity between the classifications of the upper (0.24-0.49) and lower third molars (>0.55) was observed. When bone superimposition is present, the classification process is not significantly influenced; however, variation in teeth angulation affects the assessment. The results suggest that with an efficient preparation, the level of experience as a factor can be overcome. Mincer and colleague's classification system can be replicated with ease and consistency, even though the classification of upper and lower third molars presents distinct challenges.

18.

Feasibility and Inter-rater Reliability of the Japanese Version of the Intensive Care Unit Mobility Scale.

Yasumura, Daisetsu; Katsukawa, Hajime; Matsuo, Ryu; Kawano, Reo; Taito, Shunsuke; Liu, Keibun; Hodgson, Carol.

Cureus ; 16(4): e59135, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38803745

RESUMO

Purpose The purpose of this study was to verify the feasibility and inter-rater reliability of the Japanese version of the Intensive Care Unit Mobility Scale (IMS). Methods A prospective observational study was conducted at two intensive care units (ICUs) in Japan. The feasibility of the Japanese version of the IMS was assessed by 25 ICU staff (12 physical therapists and 13 nurses) using a 10-item questionnaire. Inter-rater reliability was assessed by two experienced physical therapists and two experienced nurses working with 100 ICU patients using the Japanese version of the IMS. Results In the questionnaire survey assessing feasibility, a high agreement rate was shown in 8 out of the 10 questions. All respondents could complete the IMS evaluation, and most respondents were able to complete the scoring of the IMS in a short time. The inter-rater reliability of the Japanese version of the IMS on the first day of physical therapy for ICU patients was 0.966 (95% CI: 9.94-9.99) for the weighted kappa coefficient and 0.985 (95% CI: 9.97-9.99) on the ICU discharge date assessment. The weighted κ coefficient showed an "almost perfect agreement" of 0.8 or higher. Conclusion The Japanese version of the IMS is a feasible tool with strong inter-rater reliability for the measurement of physical activity in ICU patients.

19.

Measuring narrative identity: rater coding versus questionnaire-based approaches.

Gehrt, Tine B; Nielsen, Niels Peter; Hoyle, Rick H; Rubin, David C; Berntsen, Dorthe.

Memory ; : 1-11, 2024 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-38809783

RESUMO

Narrative identity - how individuals narrate their lived and remembered past - is usually assessed via independent rater coding, but new methods relying on self-report have been introduced. To test the assumption that different methods assess aspects of the same underlying construct, studies measuring similar components of narrative identity with different methods are needed. However, such studies are surprisingly rare. To begin to fill this gap, the present study compared the narrative variables, temporal coherence, causal coherence, and thematic coherence, measured via rater coding of participants' self-generated narratives of the remembered past and via subscales of the self-report measure Awareness of Narrative Identity Questionnaire (ANIQ). The results showed that the ANIQ subscales did not correlate significantly with their corresponding rater-coded dimension, and that the ANIQ subscales were generally unrelated to the other rater-coded dimensions. Furthermore, an exploratory factor analysis demonstrated that the ANIQ subscales loaded together on a factor that did not include any rater-coded variables. The findings suggest that the narrative variables share little empirical overlap when assessed via the ANIQ and rater coding of self-generated narratives.

20.

How Reliable are Single-Question Workplace-Based Assessments in Surgery?

Gates, Rebecca S; Krumm, Andrew E; Cate, Olle Ten; Chen, Xilin; Marcotte, Kayla; Thelen, Angela E; Deal, Shanley B; Alseidi, Adnan; Swanson, David; George, Brian C.

J Surg Educ ; 81(7): 967-972, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38816336

RESUMO

OBJECTIVE: Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency. DESIGN: General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations. PARTICIPANTS: Nine faculty members viewed the videos and assigned ratings for multiple WBAs. RESULTS: Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47. CONCLUSIONS: All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.

Assuntos

Competência Clínica , Avaliação Educacional , Cirurgia Geral , Internato e Residência , Local de Trabalho , Cirurgia Geral/educação , Reprodutibilidade dos Testes , Humanos , Avaliação Educacional/métodos , Educação de Pós-Graduação em Medicina/métodos , Gravação em Vídeo , Docentes de Medicina , Masculino , Feminino

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA