Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
Med Teach ; : 1-9, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38976711

ABSTRACT

INTRODUCTION: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation. METHODS: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school. RESULTS: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4). DISCUSSION: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.

2.
BMJ Open ; 14(6): e088263, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38871663

ABSTRACT

INTRODUCTION: Early childhood development forms the foundations for functioning later in life. Thus, accurate monitoring of developmental trajectories is critical. However, such monitoring often relies on time-intensive assessments which necessitate administration by skilled professionals. This difficulty is exacerbated in low-resource settings where such professionals are predominantly concentrated in urban and often private clinics, making them inaccessible to many. This geographic and economic inaccessibility contributes to a significant 'detection gap' where many children who might benefit from support remain undetected. The Scalable Transdiagnostic Early Assessment of Mental Health (STREAM) project aims to bridge this gap by developing an open-source, scalable, tablet-based platform administered by non-specialist workers to assess motor, social and cognitive developmental status. The goal is to deploy STREAM through public health initiatives, maximising opportunities for effective early interventions. METHODS AND ANALYSIS: The STREAM project will enrol and assess 4000 children aged 0-6 years from Malawi (n=2000) and India (n=2000). It integrates three established developmental assessment tools measuring motor, social and cognitive functioning using gamified tasks, observation checklists, parent-report and audio-video recordings. Domain scores for motor, social and cognitive functioning will be developed and assessed for their validity and reliability. These domain scores will then be used to construct age-adjusted developmental reference curves. ETHICS AND DISSEMINATION: Ethical approval has been obtained from local review boards at each site (India: Sangath Institutional Review Board; All India Institute of Medical Science (AIIMS) Ethics Committee; Indian Council of Medical Research-Health Ministry Screening Committee; Malawi: College of Medicine Research and Ethics Committee; Malawi Ministry of Health-Blantyre District Health Office). The study adheres to Good Clinical Practice standards and the ethical guidelines of the 6th (2008) Declaration of Helsinki. Findings from STREAM will be disseminated to participating families, healthcare professionals, policymakers, educators and researchers, at local, national and international levels through meetings, academic journals and conferences.


Subject(s)
Child Development , Mental Health , Humans , Child, Preschool , Infant , Child , India , Malawi , Female , Infant, Newborn , Male , Reproducibility of Results , Research Design
3.
Br J Sports Med ; 58(2): 73-80, 2024 Jan 03.
Article in English | MEDLINE | ID: mdl-37945324

ABSTRACT

OBJECTIVES: This study aimed to (1) develop a new measure of adherence to exercise for musculoskeletal (MSK) pain (Adherence To Exercise for Musculoskeletal Pain Tool: ATEMPT) based on previously conceptualised domains of exercise adherence, (2) report the content and structural validity, internal consistency, test-retest reliability, and measurement error for the ATEMPT outcome measure in patients managed with exercise for MSK pain. METHODS: ATEMPT was created using statements describing adherence generated by patients, physiotherapists and researchers, with content validity established. Baseline and retest questionnaires were distributed to patients recommended exercise for MSK pain in 11 National Health Service physiotherapy clinics. Items demonstrating low response variation were removed and the following measurement properties assessed: structural validity, internal consistency, test-retest reliability and measurement error. RESULTS: Baseline and retest data were collected from 382 and 112 patients with MSK pain, respectively. Confirmatory factor analysis established that a single factor solution was the best fit according to Bayesian Information Criterion. The 6-item version of the measure (scored 6-30) demonstrated optimal internal consistency (Cronbach's Alpha 0.86, 95% CI 0.83 to 0.88) with acceptable levels of test-retest reliability (intraclass correlation coefficient 0.84, 95% CI 0.78 to 0.88) and measurement error (smallest detectable change 3.77, 95% CI 3.27 to 4.42) (SE of measurement 2.67, 95% CI 2.31 to 3.16). CONCLUSION: The 6-item ATEMPT was developed from the six domains of exercise adherence. It has adequate content and structural validity, internal consistency, test-retest reliability and measurement error in patients with MSK pain, but should undergo additional testing to establish the construct validity and responsiveness.


Subject(s)
Musculoskeletal Pain , Humans , Reproducibility of Results , Bayes Theorem , State Medicine , Psychometrics , Surveys and Questionnaires
4.
BMC Med Educ ; 23(1): 803, 2023 Oct 26.
Article in English | MEDLINE | ID: mdl-37885005

ABSTRACT

PURPOSE: Ensuring equivalence of examiners' judgements within distributed objective structured clinical exams (OSCEs) is key to both fairness and validity but is hampered by lack of cross-over in the performances which different groups of examiners observe. This study develops a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA) using it to compare examiners scoring from different OSCE sites for the first time. MATERIALS/ METHODS: Within a summative 16 station OSCE, volunteer students were videoed on each station and all examiners invited to score station-specific comparator videos in addition to usual student scoring. Linkage provided through the video-scores enabled use of Many Facet Rasch Modelling (MFRM) to compare 1/ examiner-cohort and 2/ site effects on students' scores. RESULTS: Examiner-cohorts varied by 6.9% in the overall score allocated to students of the same ability. Whilst only a tiny difference was apparent between sites, examiner-cohort variability was greater in one site than the other. Adjusting student scores produced a median change in rank position of 6 places (0.48 deciles), however 26.9% of students changed their rank position by at least 1 decile. By contrast, only 1 student's pass/fail classification was altered by score adjustment. CONCLUSIONS: Whilst comparatively limited examiner participation rates may limit interpretation of score adjustment in this instance, this study demonstrates the feasibility of using VESCA for quality assurance purposes in large scale distributed OSCEs.


Subject(s)
Educational Measurement , Students, Medical , Humans , Educational Measurement/methods , Clinical Competence
5.
Sci Rep ; 13(1): 14921, 2023 09 10.
Article in English | MEDLINE | ID: mdl-37691074

ABSTRACT

Detecting when others are looking at us is a crucial social skill. Accordingly, a range of gaze angles is perceived as self-directed; this is termed the "cone of direct gaze" (CoDG). Multiple cues, such as nose and head orientation, are integrated during gaze perception. Thus, occluding the lower portion of the face, such as with face masks during the COVID-19 pandemic, may influence how gaze is perceived. Individual differences in the prioritisation of eye-region and non-eye-region cues may modulate the influence of face masks on gaze perception. Autistic individuals, who may be more reliant on non-eye-region directional cues during gaze perception, might be differentially affected by face masks. In the present study, we compared the CoDG when viewing masked and unmasked faces (N = 157) and measured self-reported autistic traits. The CoDG was wider for masked compared to unmasked faces, suggesting that reduced reliability of lower face cues increases the range of gaze angles perceived as self-directed. Additionally, autistic traits positively predicted the magnitude of CoDG difference between masked and unmasked faces. This study provides crucial insights into the effect of face masks on gaze perception, and how they may affect autistic individuals to a greater extent.


Subject(s)
Autistic Disorder , COVID-19 , Humans , Masks , Pandemics , Reproducibility of Results , Perception
6.
Psychol Health ; : 1-23, 2023 Jul 05.
Article in English | MEDLINE | ID: mdl-37408463

ABSTRACT

OBJECTIVE: Caring for a child with cystic fibrosis (CF) is a rigorous daily commitment for caregivers and treatment burden is a major concern. We aimed to develop and validate a short form version of a 46-item tool assessing the Challenge of Living with Cystic Fibrosis (CLCF) for clinical or research use. DESIGN: A novel genetic algorithm based on 'evolving' a subset of items from a pre-specified set of criteria, was applied to optimise the tool, using data from 135 families. MAIN OUTCOME MEASURES: Internal reliability and validity were assessed; the latter compared scores to validated tests of parental well-being, markers of treatment burden, and disease severity. RESULTS: The 15-item CLCF-SF demonstrated very good internal consistency [Cronbach's alpha 0.82 (95%CI 0.78-0.87)]. Scores for convergent validity correlated with the Beck Depression Inventory (Rho = 0.48), State Trait Anxiety Inventory (STAI-State, Rho = 0.41; STAI-Trait, Rho = 0.43), Cystic Fibrosis Questionnaire-Revised, lung function (Rho = -0.37), caregiver treatment management (r = 0.48) and child treatment management (r = 0.45), and discriminated between unwell and well children with CF (Mean Difference 5.5, 95%CI 2.5-8.5, p < 0.001), and recent or no hospital admission (MD 3.6, 95%CI 0.25-6.95, p = 0.039). CONCLUSION: The CLCF-SF provides a robust 15-item tool for assessing the challenge of living with a child with CF.

7.
Nutrients ; 15(12)2023 Jun 07.
Article in English | MEDLINE | ID: mdl-37375563

ABSTRACT

Stunting affects 22% children globally, putting them at risk of adverse outcomes including delayed development. We investigated the effect of milk protein (MP) vs. soy and whey permeate (WP) vs. maltodextrin in large-quantity, lipid-based nutrient supplement (LNS), and LNS itself vs. no supplementation, on child development and head circumference among stunted children aged 1-5 years. We conducted a randomized, double-blind, community-based 2 × 2 factorial trial in Uganda (ISRCTN1309319). We randomized 600 children to one of four LNS formulations (~535 kcal/d), with or without MP (n = 299 vs. n = 301) or WP (n = 301 vs. n = 299), for 12 weeks or to no supplementation (n = 150). Child development was assessed using the Malawi Development Assessment Tool. Data were analyzed using linear mixed-effects models. Children had a median [interquartile range] age of 30 [23; 41] months and mean ± standard deviation height-for-age z-score of -3.02 ± 0.74. There were no interactions between MP and WP for any of the outcomes. There was no effect of either MP or WP on any developmental domain. Although LNS itself had no impact on development, it resulted in 0.07 (95%CI: 0.004; 0.14) cm higher head circumference. Neither dairy in LNS, nor LNS in itself, had an effect on development among already stunted children.


Subject(s)
Child Development , Whey , Humans , Child , Infant , Milk Proteins , Uganda , Micronutrients , Dietary Supplements , Growth Disorders/prevention & control , Nutrients , Whey Proteins , Lipids
8.
BMJ Glob Health ; 8(1)2023 01.
Article in English | MEDLINE | ID: mdl-36650017

ABSTRACT

INTRODUCTION: With the ratification of the Sustainable Development Goals, there is an increased emphasis on early childhood development (ECD) and well-being. The WHO led Global Scales for Early Development (GSED) project aims to provide population and programmatic level measures of ECD for 0-3 years that are valid, reliable and have psychometrically stable performance across geographical, cultural and language contexts. This paper reports on the creation of two measures: (1) the GSED Short Form (GSED-SF)-a caregiver reported measure for population-evaluation-self-administered with no training required and (2) the GSED Long Form (GSED-LF)-a directly administered/observed measure for programmatic evaluation-administered by a trained professional. METHODS: We selected 807 psychometrically best-performing items using a Rasch measurement model from an ECD measurement databank which comprised 66 075 children assessed on 2211 items from 18 ECD measures in 32 countries. From 766 of these items, in-depth subject matter expert judgements were gathered to inform final item selection. Specifically collected were data on (1) conceptual matches between pairs of items originating from different measures, (2) developmental domain(s) measured by each item and (3) perceptions of feasibility of administration of each item in diverse contexts. Prototypes were finalised through a combination of psychometric performance evaluation and expert consensus to optimally identify items. RESULTS: We created the GSED-SF (139 items) and GSED-LF (157 items) for tablet-based and paper-based assessments, with an optimal set of items that fit the Rasch model, met subject matter expert criteria, avoided conceptual overlap, covered multiple domains of child development and were feasible to implement across diverse settings. CONCLUSIONS: State-of-the-art quantitative and qualitative procedures were used to select of theoretically relevant and globally feasible items representing child development for children aged 0-3 years. GSED-SF and GSED-LF will be piloted and validated in children across diverse cultural, demographic, social and language contexts for global use.


Subject(s)
Big Data , Judgment , Humans , Child , Child, Preschool , Surveys and Questionnaires , Child Development , Psychometrics
9.
BMJ Open ; 13(1): e062562, 2023 01 24.
Article in English | MEDLINE | ID: mdl-36693690

ABSTRACT

INTRODUCTION: Children's early development is affected by caregiving experiences, with lifelong health and well-being implications. Governments and civil societies need population-based measures to monitor children's early development and ensure that children receive the care needed to thrive. To this end, the WHO developed the Global Scales for Early Development (GSED) to measure children's early development up to 3 years of age. The GSED includes three measures for population and programmatic level measurement: (1) short form (SF) (caregiver report), (2) long form (LF) (direct administration) and (3) psychosocial form (PF) (caregiver report). The primary aim of this protocol is to validate the GSED SF and LF. Secondary aims are to create preliminary reference scores for the GSED SF and LF, validate an adaptive testing algorithm and assess the feasibility and preliminary validity of the GSED PF. METHODS AND ANALYSIS: We will conduct the validation in seven countries (Bangladesh, Brazil, Côte d'Ivoire, Pakistan, The Netherlands, People's Republic of China, United Republic of Tanzania), varying in geography, language, culture and income through a 1-year prospective design, combining cross-sectional and longitudinal methods with 1248 children per site, stratified by age and sex. The GSED generates an innovative common metric (Developmental Score: D-score) using the Rasch model and a Development for Age Z-score (DAZ). We will evaluate six psychometric properties of the GSED SF and LF: concurrent validity, predictive validity at 6 months, convergent and discriminant validity, and test-retest and inter-rater reliability. We will evaluate measurement invariance by comparing differential item functioning and differential test functioning across sites. ETHICS AND DISSEMINATION: This study has received ethical approval from the WHO (protocol GSED validation 004583 20.04.2020) and approval in each site. Study results will be disseminated through webinars and publications from WHO, international organisations, academic journals and conference proceedings. REGISTRATION DETAILS: Open Science Framework https://osf.io/ on 19 November 2021 (DOI 10.17605/OSF.IO/KX5T7; identifier: osf-registrations-kx5t7-v1).


Subject(s)
Caregivers , Language , Humans , Child , Child, Preschool , Reproducibility of Results , Cross-Sectional Studies , Surveys and Questionnaires , Psychometrics/methods
10.
Psychol Health ; 38(10): 1309-1344, 2023.
Article in English | MEDLINE | ID: mdl-35259034

ABSTRACT

OBJECTIVE: Treatments for cystic fibrosis (CF) are complex, labour-intensive, and perceived as highly burdensome by caregivers of children with CF. An instrument assessing burden of care is needed. DESIGN: A stepwise, qualitative design was used to create the CLCF with caregiver focus groups, participant researchers, a multidisciplinary professional panel, and cognitive interviews. MAIN OUTCOME MEASURES: Preliminary psychometric analyses evaluated the reliability and convergent validity of the CLCF scores. Cronbach's alpha assessed internal consistency and t-tests examined test-retest reliability. Correlations measured convergence between the Treatment Burden scale of the Cystic Fibrosis Questionnaire-Revised (CFQ-R) and the CLCF. Discriminant validity was assessed by comparing CLCF scores in one vs two-parent families, across ages, and in children with vs without Pseudomonas aeruginosa (PA). RESULTS: Six Challenge subscales emerged from the qualitative data and the professional panel constructed a scoresheet estimating the Time and Effort required for treatments. Internal consistency and test-retest reliability were adequate. Good convergence was found between the Total Challenge score and Treatment Burden on the CFQ-R (r=-0.49, p = 0.02, n = 31). A recent PA infection signalled higher Total Challenge for caregivers (F(23)11.72, p = 0.002). CONCLUSIONS: The CLCF, developed in partnership with parents/caregivers and CF professionals, is a timely, disease-specific burden measure for clinical research.

11.
BMC Med Educ ; 22(1): 41, 2022 Jan 17.
Article in English | MEDLINE | ID: mdl-35039023

ABSTRACT

BACKGROUND: Ensuring equivalence of examiners' judgements across different groups of examiners is a priority for large scale performance assessments in clinical education, both to enhance fairness and reassure the public. This study extends insight into an innovation called Video-based Examiner Score Comparison and Adjustment (VESCA) which uses video scoring to link otherwise unlinked groups of examiners. This linkage enables comparison of the influence of different examiner-groups within a common frame of reference and provision of adjusted "fair" scores to students. Whilst this innovation promises substantial benefit to quality assurance of distributed Objective Structured Clinical Exams (OSCEs), questions remain about how the resulting score adjustments might be influenced by the specific parameters used to operationalise VESCA. Research questions, How similar are estimates of students' score adjustments when the model is run with either: fewer comparison videos per participating examiner?; reduced numbers of participating examiners? METHODS: Using secondary analysis of recent research which used VESCA to compare scoring tendencies of different examiner groups, we made numerous copies of the original data then selectively deleted video scores to reduce the number of 1/ linking videos per examiner (4 versus several permutations of 3,2,or 1 videos) or 2/examiner participation rates (all participating examiners (76%) versus several permutations of 70%, 60% or 50% participation). After analysing all resulting datasets with Many Facet Rasch Modelling (MFRM) we calculated students' score adjustments for each dataset and compared these with score adjustments in the original data using Spearman's correlations. RESULTS: Students' score adjustments derived form 3 videos per examiner correlated highly with score adjustments derived from 4 linking videos (median Rho = 0.93,IQR0.90-0.95,p < 0.001), with 2 (median Rho 0.85,IQR0.81-0.87,p < 0.001) and 1 linking videos (median Rho = 0.52(IQR0.46-0.64,p < 0.001) producing progressively smaller correlations. Score adjustments were similar for 76% participating examiners and 70% (median Rho = 0.97,IQR0.95-0.98,p < 0.001), and 60% (median Rho = 0.95,IQR0.94-0.98,p < 0.001) participation, but were lower and more variable for 50% examiner participation (median Rho = 0.78,IQR0.65-0.83, some ns). CONCLUSIONS: Whilst VESCA showed some sensitivity to the examined parameters, modest reductions in examiner participation rates or video numbers produced highly similar results. Employing VESCA in distributed or national exams could enhance quality assurance or exam fairness.


Subject(s)
Educational Measurement , Students, Medical , Clinical Competence , Humans , Judgment
12.
Med Educ ; 56(3): 292-302, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34893998

ABSTRACT

INTRODUCTION: Differential rater function over time (DRIFT) and contrast effects (examiners' scores biased away from the standard of preceding performances) both challenge the fairness of scoring in objective structured clinical exams (OSCEs). This is important as, under some circumstances, these effects could alter whether some candidates pass or fail assessments. Benefitting from experimental control, this study investigated the causality, operation and interaction of both effects simultaneously for the first time in an OSCE setting. METHODS: We used secondary analysis of data from an OSCE in which examiners scored embedded videos of student performances interspersed between live students. Embedded video position varied between examiners (early vs. late) whilst the standard of preceding performances naturally varied (previous high or low). We examined linear relationships suggestive of DRIFT and contrast effects in all within-OSCE data before comparing the influence and interaction of 'early' versus 'late' and 'previous high' versus 'previous low' conditions on embedded video scores. RESULTS: Linear relationships data did not support the presence of DRIFT or contrast effects. Embedded videos were scored higher early (19.9 [19.4-20.5]) versus late (18.6 [18.1-19.1], p < 0.001), but scores did not differ between previous high and previous low conditions. The interaction term was non-significant. CONCLUSIONS: In this instance, the small DRIFT effect we observed on embedded videos can be causally attributed to examiner behaviour. Contrast effects appear less ubiquitous than some prior research suggests. Possible mediators of these finding include the following: OSCE context, detail of task specification, examiners' cognitive load and the distribution of learners' ability. As the operation of these effects appears to vary across contexts, further research is needed to determine the prevalence and mechanisms of contrast and DRIFT effects, so that assessments may be designed in ways that are likely to avoid their occurrence. Quality assurance should monitor for these contextually variable effects in order to ensure OSCE equivalence.


Subject(s)
Clinical Competence , Educational Measurement , Humans
13.
BMJ Open ; 12(12): e064387, 2022 12 07.
Article in English | MEDLINE | ID: mdl-36600366

ABSTRACT

INTRODUCTION: Objective structured clinical exams (OSCEs) are a cornerstone of assessing the competence of trainee healthcare professionals, but have been criticised for (1) lacking authenticity, (2) variability in examiners' judgements which can challenge assessment equivalence and (3) for limited diagnosticity of trainees' focal strengths and weaknesses. In response, this study aims to investigate whether (1) sharing integrated-task OSCE stations across institutions can increase perceived authenticity, while (2) enhancing assessment equivalence by enabling comparison of the standard of examiners' judgements between institutions using a novel methodology (video-based score comparison and adjustment (VESCA)) and (3) exploring the potential to develop more diagnostic signals from data on students' performances. METHODS AND ANALYSIS: The study will use a complex intervention design, developing, implementing and sharing an integrated-task (research) OSCE across four UK medical schools. It will use VESCA to compare examiner scoring differences between groups of examiners and different sites, while studying how, why and for whom the shared OSCE and VESCA operate across participating schools. Quantitative analysis will use Many Facet Rasch Modelling to compare the influence of different examiners groups and sites on students' scores, while the operation of the two interventions (shared integrated task OSCEs; VESCA) will be studied through the theory-driven method of Realist evaluation. Further exploratory analyses will examine diagnostic performance signals within data. ETHICS AND DISSEMINATION: The study will be extra to usual course requirements and all participation will be voluntary. We will uphold principles of informed consent, the right to withdraw, confidentiality with pseudonymity and strict data security. The study has received ethical approval from Keele University Research Ethics Committee. Findings will be academically published and will contribute to good practice guidance on (1) the use of VESCA and (2) sharing and use of integrated-task OSCE stations.


Subject(s)
Education, Medical, Undergraduate , Students, Medical , Humans , Educational Measurement/methods , Education, Medical, Undergraduate/methods , Clinical Competence , Schools, Medical , Multicenter Studies as Topic
14.
Article in English | MEDLINE | ID: mdl-34204030

ABSTRACT

BACKGROUND: The early childhood years provide an important window of opportunity to build strong foundations for future development. One impediment to global progress is a lack of population-based measurement tools to provide reliable estimates of developmental status. We aimed to field test and validate a newly created tool for this purpose. METHODS: We assessed attainment of 121 Infant and Young Child Development (IYCD) items in 269 children aged 0-3 from Pakistan, Malawi and Brazil alongside socioeconomic status (SES), maternal educational, Family Care Indicators and anthropometry. Children born premature, malnourished or with neurodevelopmental problems were excluded. We assessed inter-rater and test-retest reliability as well as understandability of items. Each item was analyzed using logistic regression taking SES, anthropometry, gender and FCI as covariates. Consensus choice of final items depended on developmental trajectory, age of attainment, invariance, reliability and acceptability between countries. RESULTS: The IYCD has 100 developmental items (40 gross/fine motor, 30 expressive/receptive language/cognitive, 20 socio-emotional and 10 behavior). Items were acceptable, performed well in cognitive testing, had good developmental trajectories and high reliability across countries. Development for Age (DAZ) scores showed very good known-groups validity. CONCLUSIONS: The IYCD is a simple-to-use caregiver report tool enabling population level assessment of child development for children aged 0-3 years which performs well across three countries on three continents to provide reliable estimates of young children's developmental status.


Subject(s)
Child Development , Brazil , Child , Child, Preschool , Humans , Infant , Malawi , Pakistan , Reproducibility of Results
15.
Acad Med ; 96(8): 1189-1196, 2021 08 01.
Article in English | MEDLINE | ID: mdl-33656012

ABSTRACT

PURPOSE: Ensuring that examiners in different parallel circuits of objective structured clinical examinations (OSCEs) judge to the same standard is critical to the chain of validity. Recent work suggests examiner-cohort (i.e., the particular group of examiners) could significantly alter outcomes for some candidates. Despite this, examiner-cohort effects are rarely examined since fully nested data (i.e., no crossover between the students judged by different examiner groups) limit comparisons. In this study, the authors aim to replicate and further develop a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA), so it can be used to enhance quality assurance of distributed or national OSCEs. METHOD: In 2019, 6 volunteer students were filmed on 12 stations in a summative OSCE. In addition to examining live student performances, examiners from 8 separate examiner-cohorts scored the pool of video performances. Examiners scored videos specific to their station. Video scores linked otherwise fully nested data, enabling comparisons by Many Facet Rasch Modeling. Authors compared and adjusted for examiner-cohort effects. They also compared examiners' scores when videos were embedded (interspersed between live students during the OSCE) or judged later via the Internet. RESULTS: Having accounted for differences in students' ability, different examiner-cohort scores for the same ability of student ranged from 18.57 of 27 (68.8%) to 20.49 (75.9%), Cohen's d = 1.3. Score adjustment changed the pass/fail classification for up to 16% of students depending on the modeled cut score. Internet and embedded video scoring showed no difference in mean scores or variability. Examiners' accuracy did not deteriorate over the 3-week Internet scoring period. CONCLUSIONS: Examiner-cohorts produced a replicable, significant influence on OSCE scores that was unaccounted for by typical assessment psychometrics. VESCA offers a promising means to enhance validity and fairness in distributed OSCEs or national exams. Internet-based scoring may enhance VESCA's feasibility.


Subject(s)
Clinical Competence , Educational Measurement , Educational Measurement/methods , Humans , Physical Examination , Psychometrics
16.
Eur J Phys Rehabil Med ; 56(6): 771-779, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32975396

ABSTRACT

BACKGROUND: The Musculoskeletal Health Questionnaire (MSK-HQ) was developed to measure the health status of patients with various musculoskeletal conditions across multiple settings including rehabilitation. AIM: Formal translation and cross-cultural adaptation of the MSK-HQ into German (MSK-HQG), to determine test-retest-reliability, standard error of measurement (SEM), smallest detectable change (SDC), construct validity, responsiveness, minimal important change (MIC), and to test for floor or ceiling effects. DESIGN: Cohort study with six weeks follow-up. SETTING: Seven physiotherapy clinics/rehabilitation centres. POPULATION: Patients with a referral for physiotherapy indicating musculoskeletal complaints of the spine or extremities. METHODS: Translation and cross-cultural adaptation were carried out in accordance with guidelines provided by the developers. As reference standards we used pain intensity (0-10 numeric rating scale), quality of life (EQ5D-5L) and disability measures (RMDQ, NDI, WOMAC and SPADI) that were combined using z-scores. RESULTS: On 100 patients (age 44.8±13.4 years, 66% female) the test-retest-reliability intraclass correlation coefficient was 0.87 (95% CI 0.72; 0.93) and for construct validity correlation with the combined disability measure was rs=-0.81 (95% CI -0.88, -0.72), the SEM was 3.4, the SDC (individual) 9.4, and the MIC 8.5. CONCLUSIONS: Overall, the study provides evidence for good reliability and validity for the MSK-HQG. Further studies in different settings and diagnostic subgroups should follow to better understand the psychometric properties of this measure in primary care, rehabilitation and specialist care settings. CLINICAL REHABILITATION IMPACT: The results demonstrate that the MSK-HQG has sufficient psychometric properties for use in musculoskeletal research and practice. However, the SDC should be kept in mind when using the tool for individual patients. The MSK-HQG has the advantage of being a single instrument that can measure musculoskeletal health status across different pain sites, reducing the burden from the use of multiple tools.


Subject(s)
Cross-Cultural Comparison , Musculoskeletal Diseases/therapy , Musculoskeletal Pain/therapy , Surveys and Questionnaires/standards , Translating , Adult , Cohort Studies , Female , Germany , Humans , Male , Middle Aged , Psychometrics , Reproducibility of Results
17.
Front Psychol ; 11: 1357, 2020.
Article in English | MEDLINE | ID: mdl-32765335

ABSTRACT

For practical and theoretical purposes, tests of second language (L2) ability commonly aim to measure one overarching trait, general language ability, while simultaneously measuring multiple sub-traits (e.g., reading, grammar, etc.). This tension between measuring uni- and multi-dimensional constructs concurrently can generate vociferous debate about the precise nature of the construct(s) being measured. In L2 testing, this tension is often addressed through the use of a higher-order factor model wherein multidimensional traits representing subskills load on a general ability latent trait. However, an alternative modeling framework that is currently uncommon in language testing, but gaining traction in other disciplines, is the bifactor model. The bifactor model hypothesizes a general factor, onto which all items load, and a series of orthogonal (uncorrelated) skill-specific grouping factors. The model is particularly valuable for evaluating the empirical plausibility of subscales and the practical impact of dimensionality assumptions on test scores. This paper compares a range of CFA model structures with the bifactor model in terms of theoretical implications and practical considerations, framed for the language testing audience. The models are illustrated using primary data from the British Council's Aptis English test. The paper is intended to spearhead the uptake of the bifactor model within the cadre of measurement models used in L2 language testing.

18.
Health Qual Life Outcomes ; 18(1): 200, 2020 Jun 23.
Article in English | MEDLINE | ID: mdl-32576190

ABSTRACT

BACKGROUND: The Musculoskeletal Health Questionnaire (MSK-HQ) has been developed to measure musculoskeletal health status across musculoskeletal conditions and settings. However, the MSK-HQ needs to be further evaluated across settings and different languages. OBJECTIVE: The objective of the study was to evaluate and compare measurement properties of the MSK-HQ across Danish (DK) and English (UK) cohorts of patients from primary care physiotherapy services with musculoskeletal pain. METHODS: MSK-HQ was translated into Danish according to international guidelines. Measurement invariance was assessed by differential item functioning (DIF) analyses. Test-retest reliability, measurement error, responsiveness and minimal clinically important change (MCIC) were evaluated and compared between DK (n = 153) and UK (n = 166) cohorts. RESULTS: The Danish version demonstrated acceptable face and construct validity. Out of the 14 MSK-HQ items, three items showed DIF for language (pain/stiffness at night, understanding condition and confidence in managing symptoms) and three items showed DIF for pain location (walking, washing/dressing and physical activity levels). Intraclass Correlation Coefficients for test-retest were 0.86 (95% CI 0.81 to 0.91) for DK cohort and 0.77 (95% CI 0.49 to 0.90) for the UK cohort. The systematic measurement error was 1.6 and 3.9 points for the DK and UK cohorts respectively, with random measurement error being 8.6 and 9.9 points. Receiver operating characteristic (ROC) curves of the change scores against patients' own judgment at 12 weeks exceeded 0.70 in both cohorts. Absolute and relative MCIC estimates were 8-10 points and 26% for the DK cohort and 6-8 points and 29% for the UK cohort. CONCLUSIONS: The measurement properties of MSK-HQ were acceptable across countries, but seem more suited for group than individual level evaluation. Researchers and clinicians should be aware that some discrepancy exits and should take the observed measurement error into account when evaluating change in scores over time.


Subject(s)
Musculoskeletal Pain/psychology , Quality of Life , Adult , Cross-Cultural Comparison , Denmark , Female , Humans , Male , Middle Aged , Prospective Studies , Reproducibility of Results , Surveys and Questionnaires , Translations , United Kingdom
19.
Med Educ ; 53(3): 250-263, 2019 03.
Article in English | MEDLINE | ID: mdl-30575092

ABSTRACT

BACKGROUND: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores. METHODS: We developed video-based examiner score comparison and adjustment (VESCA): volunteer students were filmed 'live' on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner cohorts. Many-facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner-cohort effects on students' scores. RESULTS: After accounting for students' ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students' global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6-9.5% students' scores were altered by at least 0.5 standard deviations of student ability. CONCLUSIONS: Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.


Subject(s)
Clinical Competence/standards , Education, Medical, Undergraduate/standards , Educational Measurement/methods , Educational Measurement/standards , Observer Variation , Videotape Recording/methods , Education, Medical, Undergraduate/methods , Humans , Reproducibility of Results , Students, Medical
20.
BMJ Glob Health ; 3(5): e000747, 2018.
Article in English | MEDLINE | ID: mdl-30364327

ABSTRACT

BACKGROUND: Renewed global commitment to the improvement of early child development outcomes, as evidenced by the focus of the United Nations Sustainable Development Goal 4, highlights an increased need for reliable and valid measures to evaluate preventive and interventional efforts designed to affect change. Our objective was to create a new tool, applicable across multicultures, to measure development from 0 to 3 years through metadata synthesis. METHODS: Fourteen cross-sectional data sets were contributed on 21 083 children from 10 low/middle-income countries (LMIC), assessed using seven different tools (caregiver reported or directly assessed). Item groups, measuring similar developmental skills, were identified by item mapping across tools. Logistic regression curves displayed developmental trajectories for item groups across countries and age. Following expert consensus to identify well-performing items across developmental domains, a second mapping exercise was conducted to fill any gaps across the age range. The first version of the tool was constructed. Item response analysis validated our approach by putting all data sets onto a common scale. RESULTS: 789 individual items were identified across tools in the first mapping and 129 item groups selected for analysis. 70 item groups were then selected through consensus, based on statistical performance and perceived importance, with a further 50 items identified at second mapping. A tool comprising 120 items (23 fine motor, 23 gross motor, 20 receptive language, 24 expressive language, 30 socioemotional) was created. The linked data sets on a common scale showed a curvilinear trajectory of child development, highlighting the validity of our approach through excellent coverage by age and consistency of measurement across contributed tools, a novel finding in itself. CONCLUSIONS: We have created the first version of a prototype tool for measuring children in the early years, developed using novel easy to apply methodology; now it needs to be feasibility tested and piloted across several LMICs.

SELECTION OF CITATIONS
SEARCH DETAIL
...