Search | VHL Regional Portal

Immediate and longer-term impacts of fetal surveillance education on workforce knowledge and cognitive skills.

Beaves, Mark; Zoanetti, Nathan; Wallace, Euan M; Palmer, Kirsten R.

MedEdPublish (2016) ; 13: 38, 2023.

Article in English | MEDLINE | ID: mdl-38779369

ABSTRACT

Background: Following the development of the Royal Australian College of Obstetricians and Gynaecologists Intrapartum Fetal Surveillance Guideline in 2003, an education program was developed to support guideline implementation and clinical practice. It was intended that improved clinician knowledge, particularly of cardiotocography, would reduce rates of intrapartum fetal morbidity and mortality. The program contains a multiple-choice assessment, designed to assess fetal surveillance knowledge and the application of that knowledge. We used the results of this assessment over time to evaluate the impact of the education program on clinicians' fetal surveillance knowledge and interpretive skills, in the immediate and longer-term. Methods: We undertook a retrospective analysis of the assessment results for all participants in the Fetal Surveillance Education Program, between 2004 and 2018. Classical Test Theory and Rasch Item Response Theory analysis were used to evaluate the statistical reliability and quality of the assessment, and the measurement invariance or stability of the assessments over time. Clinicians' assessment scores were then reviewed by craft group and previous exposure to the program. Results: The results from 64,430, broadly similar assessments, showed that participation in the education program was associated with an immediate improvement in clinician performance in the assessment. Performance improvement was sustained for up to 18 months following participation in the program and recurrent participation was associated with progressive improvements. These trends were observed for all craft groups (consultant obstetricians, doctors in training, general practitioners, midwives, student midwives). Conclusions: These findings suggest that the Fetal Surveillance Education Program has improved clinician knowledge and the associated cognitive skills over time. The stable difficulty of the assessment tool means any improvement in clinician's results, with ongoing exposure to the program, can be reliably assessed and demonstrated. Importantly this holds true for all craft groups involved in intrapartum care and the interpretation of cardiotocography.

The potential use of Bayesian Networks to support committee decisions in programmatic assessment.

Zoanetti, Nathan; Pearce, Jacob.

Med Educ ; 55(7): 808-817, 2021 07.

Article in English | MEDLINE | ID: mdl-33151589

ABSTRACT

CONTEXT: The benefits of programmatic assessment are well-established. Evidence from multiple assessment formats is accumulated and triangulated to inform progression committee decisions. Committees are consistently challenged to ensure consistency and fairness in programmatic deliberations. Traditional statistical and psychometric techniques are not well-suited to aggregating different assessment formats accumulated over time. Some of the strengths of programmatic assessment are also vulnerabilities viewed through this lens. While emphasis is often placed on data richness and considered input of qualified experts, committees reasonably wish for practical, defensible solutions to these challenges. METHODS: We draw upon on existing literature regarding Bayesian Networks (BN), noting their utility and application in educational systems. We provide illustrative examples of how they could potentially be used in contexts that embed programmatic principles. We show a simple BN for a knowledge domain before presenting a full-scale 'proof of concept' BN to support committee decisions. We zoom in on one 'node' to demonstrate the capacity of incorporating disparate evidence throughout the network. CONCLUSIONS: Bayesian Networks offer an approach that is theoretically well-supported for programmatic assessment. They can aid committees in managing evidence accumulation, help them make inferences under conditions of uncertainty, and buttress decisions by adding a layer of defensibility to the process. They are a pragmatic tool adding value to the programmatic space by applying a complementary statistical framework. We see four major benefits of BNs in programmatic assessment: BNs allow for visual capturing of evidentiary arguments by committees during decision-making; 'recommendations' from probabilistic pathways can be used by committees to confirm their qualitative judgments; BNs can ensure precedents are maintained and consistency occurs over time; and the imperative to capture data richness is maintained without resorting to questionable methodological strategies such as adding qualitatively different things together. Further research into their feasibility and robustness in practice is warranted.

Subject(s)

Bayes Theorem , Humans , Uncertainty

Graphical Item Maps: providing clearer feedback on professional exam performance.

Beaves, Mark; Wallace, Euan; Zoanetti, Nathan; Griffin, Patrick; Wu, Margaret.

MedEdPublish (2016) ; 7: 116, 2018.

Article in English | MEDLINE | ID: mdl-38074561

ABSTRACT

This article was migrated. The article was marked as recommended. Background: Structured feedback is an important component of learning and assessment and is highly valued by candidates. Unfortunately, item specific feedback is generally not feasible for high stakes professional assessments due to the high cost of item development and the need to maintain stable assessment performance characteristics. In a high stakes assessment of fetal surveillance knowledge, we sought to use graphical item mapping to allow informative candidate feedback without compromising the item bank. Methods: We developed Graphical Item Maps (GIMs) to display individual candidate performance in the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP) multiple-choice question assessment. GIMs use item and person parameter estimates from Item Response Theory (IRT) models to map the interaction between a test taker and assessment tasks of varying difficulty. Results: It is both feasible and relatively simple to provide GIMs for individual candidate feedback. Operational examples are presented from the RANZCOG FSEP assessment. This paper demonstrates how test takers and educators might use GIMs as a form of assessment feedback. Conclusions: Graphical Item Maps are a useful and insightful assessment feedback tool for clinical practitioners partaking in a high stakes professional education and assessment program. They might be usefully employed in similar healthcare professional assessments to inform directed learning.

Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M.

BMC Med Educ ; 13: 35, 2013 Mar 04.

Article in English | MEDLINE | ID: mdl-23453056

ABSTRACT

BACKGROUND: Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. METHODS: The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. RESULTS: Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. CONCLUSIONS: The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.

Subject(s)

Education, Medical/standards , Educational Measurement/methods , Fetal Diseases/diagnosis , Educational Measurement/standards , Humans

Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview?

Roberts, Chris; Rothnie, Imogene; Zoanetti, Nathan; Crossley, Jim.

Med Educ ; 44(7): 690-8, 2010 Jul.

Article in English | MEDLINE | ID: mdl-20636588

ABSTRACT

CONTEXT: There are significant levels of variation in candidate multiple mini-interview (MMI) scores caused by interviewer-related factors. Multi-facet Rasch modelling (MFRM) has the capability to both identify these sources of error and partially adjust for them within a measurement model that may be fairer to the candidate. METHODS: Using facets software, a variance components analysis estimated sources of measurement error that were comparable with those produced by generalisability theory. Fair average scores for the effects of the stringency/leniency of interviewers and question difficulty were calculated and adjusted rankings of candidates were modelled. RESULTS: The decisions of 207 interviewers had an acceptable fit to the MFRM model. For one candidate assessed by one interviewer on one MMI question, 19.1% of the variance reflected candidate ability, 8.9% reflected interviewer stringency/leniency, 5.1% reflected interviewer question-specific stringency/leniency and 2.6% reflected question difficulty. If adjustments were made to candidates' raw scores for interviewer stringency/leniency and question difficulty, 11.5% of candidates would see a significant change in their ranking for selection into the programme. Greater interviewer leniency was associated with the number of candidates interviewed. CONCLUSIONS: Interviewers differ in their degree of stringency/leniency and this appears to be a stable characteristic. The MFRM provides a recommendable way of giving a candidate score which adjusts for the stringency/leniency of whichever interviewers the candidate sees and the difficulty of the questions the candidate is asked.

Subject(s)

Educational Measurement/methods , Interviews as Topic , School Admission Criteria , Clinical Competence , Communication , Educational Measurement/standards , Faculty, Medical , Humans , Observer Variation , Psychometrics/methods

Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment.

Zoanetti, Nathan; Griffin, Patrick; Beaves, Mark; Wallace, Euan M.

BMC Med Educ ; 9: 20, 2009 Apr 29.

Article in English | MEDLINE | ID: mdl-19402898

ABSTRACT

BACKGROUND: It is widely recognised that deficiencies in fetal surveillance practice continue to contribute significantly to the burden of adverse outcomes. This has prompted the development of evidence-based clinical practice guidelines by the Royal Australian and New Zealand College of Obstetricians and Gynaecologists and an associated Fetal Surveillance Education Program to deliver the associated learning. This article describes initial steps in the validation of a corresponding multiple-choice assessment of the relevant educational outcomes through a combination of item response modelling and expert judgement. METHODS: The Rasch item response model was employed for item and test analysis and to empirically derive the substantive interpretation of the assessment variable. This interpretation was then compared to the hierarchy of competencies specified a priori by a team of eight subject-matter experts. Classical Test Theory analyses were also conducted. RESULTS: A high level of agreement between the hypothesised and derived variable provided evidence of construct validity. Item and test indices from Rasch analysis and Classical Test Theory analysis suggested that the current test form was of moderate quality. However, the analyses made clear the required steps for establishing a valid assessment of sufficient psychometric quality. These steps included: increasing the number of items from 40 to 50 in the first instance, reviewing ineffective items, targeting new items to specific content and difficulty gaps, and formalising the assessment blueprint in light of empirical information relating item structure to item difficulty. CONCLUSION: The application of the Rasch model for criterion-referenced assessment validation with an expert stakeholder group is herein described. Recommendations for subsequent item and test construction are also outlined in this article.

Subject(s)

Curriculum , Fetal Development , Obstetrics/education , Psychometrics , Surveys and Questionnaires , Australia , Female , Humans , Models, Psychological

Validating a multiple mini-interview question bank assessing entry-level reasoning skills in candidates for graduate-entry medicine and dentistry programmes.

Roberts, Chris; Zoanetti, Nathan; Rothnie, Imogene.

Med Educ ; 43(4): 350-9, 2009 Apr.

Article in English | MEDLINE | ID: mdl-19335577

ABSTRACT

CONTEXT: The multiple mini-interview (MMI) was initially designed to test non-cognitive characteristics related to professionalism in entry-level students. However, it may be testing cognitive reasoning skills. Candidates to medical and dental schools come from diverse backgrounds and it is important for the validity and fairness of the MMI that these background factors do not impact on their scores. METHODS: A suite of advanced psychometric techniques drawn from item response theory (IRT) was used to validate an MMI question bank in order to establish the conceptual equivalence of the questions. Bias against candidate subgroups of equal ability was investigated using differential item functioning (DIF) analysis. RESULTS: All 39 questions had a good fit to the IRT model. Of the 195 checklist items, none were found to have significant DIF after visual inspection of expected score curves, consideration of the number of applicants per category, and evaluation of the magnitude of the DIF parameter estimates. CONCLUSIONS: The question bank contains items that have been studied carefully in terms of model fit and DIF. Questions appear to measure a cognitive unidimensional construct, 'entry-level reasoning skills in professionalism', as suggested by goodness-of-fit statistics. The lack of items exhibiting DIF is encouraging in a contemporary high-stakes admission setting where candidates of diverse personal, cultural and academic backgrounds are assessed by common means. This IRT approach has potential to provide assessment designers with a quality control procedure that extends to the level of checklist items.

Subject(s)

Databases, Factual , Education, Dental/methods , Education, Medical, Undergraduate/methods , Educational Measurement/methods , School Admission Criteria , Schools, Medical/standards , Decision Making , Humans , Interviews as Topic , New South Wales , Statistics as Topic

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL