Search | VHL Regional Portal

1.

A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation.

Falcão, Filipe; Pereira, Daniela Marques; Gonçalves, Nuno; De Champlain, Andre; Costa, Patrício; Pêgo, José Miguel.

Adv Health Sci Educ Theory Pract ; 28(5): 1441-1465, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37097483

ABSTRACT

Automatic Item Generation (AIG) refers to the process of using cognitive models to generate test items using computer modules. It is a new but rapidly evolving research area where cognitive and psychometric theory are combined into digital framework. However, assessment of the item quality, usability and validity of AIG relative to traditional item development methods lacks clarification. This paper takes a top-down strong theory approach to evaluate AIG in medical education. Two studies were conducted: Study I-participants with different levels of clinical knowledge and item writing experience developed medical test items both manually and through AIG. Both item types were compared in terms of quality and usability (efficiency and learnability); Study II-Automatically generated items were included in a summative exam in the content area of surgery. A psychometric analysis based on Item Response Theory inspected the validity and quality of the AIG-items. Items generated by AIG presented quality, evidences of validity and were adequate for testing student's knowledge. The time spent developing the contents for item generation (cognitive models) and the number of items generated did not vary considering the participants' item writing experience or clinical knowledge. AIG produces numerous high-quality items in a fast, economical and easy to learn process, even for inexperienced and without clinical training item writers. Medical schools may benefit from a substantial improvement in cost-efficiency in developing test items by using AIG. Item writing flaws can be significantly reduced thanks to the application of AIG's models, thus generating test items capable of accurately gauging students' knowledge.

Subject(s)

Education, Medical, Undergraduate , Education, Medical , Humans , Educational Measurement/methods , Education, Medical, Undergraduate/methods , Psychometrics , Students

2.

Plus ça change, plus c'est pareil: Making a continued case for the use of MCQs in medical education.

Pugh, Debra; De Champlain, André; Touchie, Claire.

Med Teach ; 41(5): 569-577, 2019 05.

Article in English | MEDLINE | ID: mdl-30299196

ABSTRACT

Despite the increased emphasis on the use of workplace-based assessment in competency-based education models, there is still an important role for the use of multiple choice questions (MCQs) in the assessment of health professionals. The challenge, however, is to ensure that MCQs are developed in a way to allow educators to derive meaningful information about examinees' abilities. As educators' needs for high-quality test items have evolved so has our approach to developing MCQs. This evolution has been reflected in a number of ways including: the use of different stimulus formats; the creation of novel response formats; the development of new approaches to problem conceptualization; and the incorporation of technology. The purpose of this narrative review is to provide the reader with an overview of how our understanding of the use of MCQs in the assessment of health professionals has evolved to better measure clinical reasoning and to improve both efficiency and item quality.

Subject(s)

Education, Medical, Undergraduate , Educational Measurement/methods , Cognition , Competency-Based Education , Computer-Assisted Instruction/methods , Humans

3.

A Call to Investigate the Relationship Between Education and Health Outcomes Using Big Data.

Chahine, Saad; Kulasegaram, Kulamakan Mahan; Wright, Sarah; Monteiro, Sandra; Grierson, Lawrence E M; Barber, Cassandra; Sebok-Syer, Stefanie S; McConnell, Meghan; Yen, Wendy; De Champlain, Andre; Touchie, Claire.

Acad Med ; 93(6): 829-832, 2018 06.

Article in English | MEDLINE | ID: mdl-29538109

ABSTRACT

There exists an assumption that improving medical education will improve patient care. While seemingly logical, this premise has rarely been investigated. In this Invited Commentary, the authors propose the use of big data to test this assumption. The authors present a few example research studies linking education and patient care outcomes and argue that using big data may more easily facilitate the process needed to investigate this assumption. The authors also propose that collaboration is needed to link educational and health care data. They then introduce a grassroots initiative, inclusive of universities in one Canadian province and national licensing organizations that are working together to collect, organize, link, and analyze big data to study the relationship between pedagogical approaches to medical training and patient care outcomes. While the authors acknowledge the possible challenges and issues associated with harnessing big data, they believe that the benefits supersede these. There is a need for medical education research to go beyond the outcomes of training to study practice and clinical outcomes as well. Without a coordinated effort to harness big data, policy makers, regulators, medical educators, and researchers are left with sometimes costly guesses and assumptions about what works and what does not. As the social, time, and financial investments in medical education continue to increase, it is imperative to understand the relationship between education and health outcomes.

Subject(s)

Big Data , Education, Medical/statistics & numerical data , Needs Assessment , Outcome Assessment, Health Care/statistics & numerical data , Humans

4.

Cheating in OSCEs: The Impact of Simulated Security Breaches on OSCE Performance.

Gotzmann, Andrea; De Champlain, André; Homayra, Fahmida; Fotheringham, Alexa; de Vries, Ingrid; Forgie, Melissa; Pugh, Debra.

Teach Learn Med ; 29(1): 52-58, 2017.

Article in English | MEDLINE | ID: mdl-27603790

ABSTRACT

Construct: Valid score interpretation is important for constructs in performance assessments such as objective structured clinical examinations (OSCEs). An OSCE is a type of performance assessment in which a series of standardized patients interact with the student or candidate who is scored by either the standardized patient or a physician examiner. BACKGROUND: In high-stakes examinations, test security is an important issue. Students accessing unauthorized test materials can create an unfair advantage and lead to examination scores that do not reflect students' true ability level. The purpose of this study was to assess the impact of various simulated security breaches on OSCE scores. APPROACH: Seventy-six 3rd-year medical students participated in an 8-station OSCE and were randomized to either a control group or to 1 of 2 experimental conditions simulating test security breaches: station topic (i.e., providing a list of station topics prior to the examination) or egregious security breach (i.e., providing detailed content information prior to the examination). Overall total scores were compared for the 3 groups using both a one-way between-subjects analysis of variance and a repeated measure analysis of variance to compare the checklist, rating scales, and oral question subscores across the three conditions. RESULTS: Overall total scores were highest for the egregious security breach condition (81.8%), followed by the station topic condition (73.6%), and they were lowest for the control group (67.4%). This trend was also found with checklist subscores only (79.1%, 64.9%, and 60.3%, respectively for the security breach, station topic, and control conditions). Rating scale subscores were higher for both the station topic and egregious security breach conditions compared to the control group (82.6%, 83.1%, and 77.6%, respectively). Oral question subscores were significantly higher for the egregious security breach condition (88.8%) followed by the station topic condition (64.3%), and they were the lowest for the control group (48.6%). CONCLUSIONS: This simulation of different OSCE security breaches demonstrated that student performance is greatly advantaged by having prior access to test materials. This has important implications for medical educators as they develop policies and procedures regarding the safeguarding and reuse of test content.

Subject(s)

Clinical Competence/standards , Deception , Educational Measurement , Female , Humans , Male , Students, Medical

5.

Assessing the Reliability of Performance Assessment Scores: Some Considerations in Selecting an Appropriate Framework.

De Champlain, André F; Gotzmann, Andrea; Qin, Sirius.

J Grad Med Educ ; 8(4): 504-506, 2016 Oct.

Article in English | MEDLINE | ID: mdl-27777658

Subject(s)

Clinical Competence/standards , Education, Medical, Graduate/organization & administration , Internship and Residency/organization & administration , Observer Variation , Humans , Patient Care/standards , Reproducibility of Results

6.

Using cognitive models to develop quality multiple-choice questions.

Pugh, Debra; De Champlain, Andre; Gierl, Mark; Lai, Hollis; Touchie, Claire.

Med Teach ; 38(8): 838-43, 2016 Aug.

Article in English | MEDLINE | ID: mdl-26998566

ABSTRACT

With the recent interest in competency-based education, educators are being challenged to develop more assessment opportunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice questions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties comparable to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cognitive models.

Subject(s)

Education, Medical, Undergraduate , Educational Measurement/methods , Educational Measurement/standards , Models, Psychological , Competency-Based Education , Humans

7.

Using Automatic Item Generation to Improve the Quality of MCQ Distractors.

Lai, Hollis; Gierl, Mark J; Touchie, Claire; Pugh, Debra; Boulais, André-Philippe; De Champlain, André.

Teach Learn Med ; 28(2): 166-73, 2016.

Article in English | MEDLINE | ID: mdl-26849247

ABSTRACT

UNLABELLED: CONSTRUCT: Automatic item generation (AIG) is an alternative method for producing large numbers of test items that integrate cognitive modeling with computer technology to systematically generate multiple-choice questions (MCQs). The purpose of our study is to describe and validate a method of generating plausible but incorrect distractors. Initial applications of AIG demonstrated its effectiveness in producing test items. However, expert review of the initial items identified a key limitation where the generation of implausible incorrect options, or distractors, might limit the applicability of items in real testing situations. BACKGROUND: Medical educators require development of test items in large quantities to facilitate the continual assessment of student knowledge. Traditional item development processes are time-consuming and resource intensive. Studies have validated the quality of generated items through content expert review. However, no study has yet documented how generated items perform in a test administration. Moreover, no study has yet to validate AIG through student responses to generated test items. APPROACH: To validate our refined AIG method in generating plausible distractors, we collected psychometric evidence from a field test of the generated test items. A three-step process was used to generate test items in the area of jaundice. At least 455 Canadian and international medical graduates responded to each of the 13 generated items embedded in a high-stake exam administration. Item difficulty, discrimination, and index of discrimination estimates were calculated for the correct option as well as each distractor. RESULTS: Item analysis results for the correct options suggest that the generated items measured candidate performances across a range of ability levels while providing a consistent level of discrimination for each item. Results for the distractors reveal that the generated items differentiated the low- from the high-performing candidates. CONCLUSIONS: Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.

Subject(s)

Computer-Assisted Instruction/methods , Education, Medical, Undergraduate/methods , Educational Measurement/methods , Quality Improvement , Automation , Humans , Jaundice/diagnosis , Jaundice/therapy , Models, Educational , Psychometrics

8.

Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs.

De Champlain, Andre F; Boulais, Andre-Philippe; Dallas, Andrew.

J Educ Eval Health Prof ; 13: 6, 2016.

Article in English | MEDLINE | ID: mdl-26883811

ABSTRACT

PURPOSE: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. METHODS: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. RESULTS: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). CONCLUSION: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.

Subject(s)

Educational Measurement/standards , Licensure, Medical/standards , Calibration , Canada , Choice Behavior , Humans , Models, Theoretical

9.

Using Automated Scoring to Evaluate Written Responses in English and French on a High-Stakes Clinical Competency Examination.

Latifi, Syed; Gierl, Mark J; Boulais, André-Philippe; De Champlain, André F.

Eval Health Prof ; 39(1): 100-13, 2016 Mar.

Article in English | MEDLINE | ID: mdl-26377072

ABSTRACT

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised machine learning techniques were employed for developing the scoring models. In Stage 3, responses to six English and French CDM questions were scored using the scoring models from Stage 2. Of the 8,007 English and French CDM responses, 7,643 were accurately scored with an agreement rate of 95.4% between human and computer scoring. This result serves as an improvement of 5.4% when compared with the human inter-rater reliability. Our framework yielded scores similar to those of expert physician markers and could be used for clinical competency assessment.

Subject(s)

Clinical Competence , Educational Measurement/methods , Educational Measurement/standards , Electronic Data Processing/standards , Translating , Canada , Clinical Decision-Making , Humans , Licensure, Medical , Reproducibility of Results

10.

Examiners and content and site: Oh My! A national organization's investigation of score variation in large-scale performance assessments.

Sebok, Stefanie S; Roy, Marguerite; Klinger, Don A; De Champlain, André F.

Adv Health Sci Educ Theory Pract ; 20(3): 581-94, 2015 Aug.

Article in English | MEDLINE | ID: mdl-25164266

ABSTRACT

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada's National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.

Subject(s)

Bias , Clinical Competence , Educational Measurement/standards , Physicians , Clinical Competence/statistics & numerical data , Female , Humans , Male , Models, Statistical

11.

Automated essay scoring and the future of educational assessment in medical education.

Gierl, Mark J; Latifi, Syed; Lai, Hollis; Boulais, André-Philippe; De Champlain, André.

Med Educ ; 48(10): 950-62, 2014 Oct.

Article in English | MEDLINE | ID: mdl-25200016

ABSTRACT

CONTEXT: Constructed-response tasks, which range from short-answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students' ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed-response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as 'automated essay scoring' (AES). METHODS: An AES system uses a computer program that builds a scoring model by extracting linguistic features from a constructed-response prompt that has been pre-scored by human raters and then, using machine learning algorithms, maps the linguistic features to the human scores so that the computer can be used to classify (i.e. score or grade) the responses of a new group of students. The accuracy of the score classification can be evaluated using different measures of agreement. RESULTS: Automated essay scoring provides a method for scoring constructed-response tests that complements the current use of selected-response testing in medical education. The method can serve medical educators by providing the summative scores required for high-stakes testing. It can also serve medical students by providing them with detailed feedback as part of a formative assessment process. CONCLUSIONS: Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks.

Subject(s)

Computer-Assisted Instruction/trends , Education, Medical/methods , Education, Medical/trends , Educational Measurement/methods , Software , Clinical Competence , Humans , Writing

12.

Supervising incoming first-year residents: faculty expectations versus residents' experiences.

Touchie, Claire; De Champlain, André; Pugh, Debra; Downing, Steven; Bordage, Georges.

Med Educ ; 48(9): 921-9, 2014 Sep.

Article in English | MEDLINE | ID: mdl-25113118

ABSTRACT

CONTEXT: First-year residents begin clinical practice in settings in which attending staff and senior residents are available to supervise their work. There is an expectation that, while being supervised and as they become more experienced, residents will gradually take on more responsibilities and function independently. OBJECTIVES: This study was conducted to define 'entrustable professional activities' (EPAs) and determine the extent of agreement between the level of supervision expected by clinical supervisors (CSs) and the level of supervision reported by first-year residents. METHODS: Using a nominal group technique, subject matter experts (SMEs) from multiple specialties defined EPAs for incoming residents; these represented a set of activities to be performed independently by residents by the end of the first year of residency, regardless of specialty. We then surveyed CSs and first-year residents from one institution in order to compare the levels of supervision expected and received during the day and night for each EPA. RESULTS: The SMEs defined 10 EPAs (e.g. completing admission orders, obtaining informed consent) that were ratified by a national panel. A total of 113 CSs and 48 residents completed the survey. Clinical supervisors had the same expectations regardless of time of day. For three EPAs (managing i.v. fluids, obtaining informed consent, obtaining advanced directives) the level of supervision reported by first-year residents was lower than that expected by CSs (p < 0.001) regardless of time of day (i.e. day or night). For four more EPAs (initiating the management of a critically ill patient, handing over the care of a patient to colleagues, writing a discharge prescription, coordinating a patient discharge) differences applied only to night-time work (p ≤ 0.001). CONCLUSIONS: First-year residents reported performing EPAs with less supervision than expected by CSs, especially during the night. Using EPAs to guide the content of the undergraduate curriculum and during examinations could help better align CSs' and residents' expectations about early residency supervision.

Subject(s)

Clinical Competence/standards , Internship and Residency/standards , Attitude of Health Personnel , Faculty, Medical , Humans , Male , Ontario , Professional Practice/standards

13.

Weighting checklist items and station components on a large-scale OSCE: is it worth the effort?

Sandilands, Debra Dallie; Gotzmann, Andrea; Roy, Marguerite; Zumbo, Bruno D; De Champlain, André.

Med Teach ; 36(7): 585-90, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24787530

ABSTRACT

BACKGROUND: Past research suggests that the use of externally-applied scoring weights may not appreciably impact measurement qualities such as reliability or validity. Nonetheless, some credentialing boards and academic institutions apply differential scoring weights based on expert opinion about the relative importance of individual items or test components of Observed Structured Clinical Examinations (OSCEs). AIMS: To investigate the impact of simplified scoring models that make little to no use of differential weighting on the reliability of scores and decisions on a high stakes OSCE required for medical licensure in Canada. METHOD: We applied four different weighting models of various complexities to data from three administrations of the OSCE. We compared score reliability, pass/fail rates, correlations between the scores and classification decision accuracy and consistency across the models and administrations. RESULTS: Less complex weighting models yielded similar reliability and pass rates as the more complex weighting model. Minimal changes in candidates' pass/fail status were observed and there were strong and statistically significant correlations between the scores for all scoring models and administrations. Classification decision accuracy and consistency were very high and similar across the four scoring models. CONCLUSIONS: Adopting a simplified weighting scheme for this OSCE did not diminish its measurement qualities. Instead of developing complex weighting schemes, experts' time and effort could be better spent on other critical test development and assembly tasks with little to no compromise in the quality of scores and decisions on this high-stakes OSCE.

Subject(s)

Clinical Competence/standards , Educational Measurement/standards , Licensure, Medical/standards , Canada , Checklist , Educational Measurement/methods , Educational Measurement/statistics & numerical data , Humans , Models, Educational , Reproducibility of Results

14.

Multiple tutorial-based assessments: a generalizability study.

St-Onge, Christina; Frenette, Eric; Côté, Daniel J; De Champlain, André.

BMC Med Educ ; 14: 30, 2014 Feb 15.

Article in English | MEDLINE | ID: mdl-24528493

ABSTRACT

BACKGROUND: Tutorial-based assessment commonly used in problem-based learning (PBL) is thought to provide information about students which is different from that gathered with traditional assessment strategies such as multiple-choice questions or short-answer questions. Although multiple-observations within units in an undergraduate medical education curriculum foster more reliable scores, that evaluation design is not always practically feasible. Thus, this study investigated the overall reliability of a tutorial-based program of assessment, namely the Tutotest-Lite. METHODS: More specifically, scores from multiple units were used to profile clinical domains for the first two years of a system-based PBL curriculum. RESULTS: G-Study analysis revealed an acceptable level of generalizability, with g-coefficients of 0.84 and 0.83 for Years 1 and 2, respectively. Interestingly, D-Studies suggested that as few as five observations over one year would yield sufficiently reliable scores. CONCLUSIONS: Overall, the results from this study support the use of the Tutotest-Lite to judge clinical domains over different PBL units.

Subject(s)

Educational Measurement/methods , Problem-Based Learning , Reproducibility of Results

15.

Relationship between standardized patient checklist item accuracy and performing arts experience.

Langenau, Erik E; Dyer, Caitlin; Roberts, William L; De Champlain, André F; Montrey, Donald P; Sandella, Jeanne M.

Simul Healthc ; 6(3): 150-4, 2011 Jun.

Article in English | MEDLINE | ID: mdl-21646984

ABSTRACT

INTRODUCTION: : It is not known whether a Standardized Patient's (SP's) performing arts background could affect his or her accuracy in recording candidate performance on a high-stakes clinical skills examination, such as the Comprehensive Osteopathic Medical Licensing Examination Level 2 Performance Evaluation. The purpose of this study is to investigate the differences in recording accuracy of history and physical checklist items between SPs who identify themselves as performing artists and SPs with no performance arts experience. METHODS: : Forty SPs identified themselves as being performing artists or nonperforming artists. A sample of SP live examination ratings were compared with a second set of ratings obtained after video review (N = 1972 SP encounters) over 40 cases from the 2008-2009 testing cycle. Differences in SP checklist recording accuracy were tested as a function of performing arts experience. RESULTS: : Mean overall agreement rates, both uncorrected and corrected for chance agreement, were very high (0.94 and 0.79, respectively, at the overall examination level). There was no statistically significant difference between the two groups with respect to any of the mean accuracy measures: history taking (z = -0.422, P = 0.678), physical examination (z = -1.453, P = 0.072), and overall data gathering (z = -0.812, P = 0.417) checklist items. CONCLUSION: : Results suggest that SPs with or without a performing arts background complete history taking and physical examination checklist items with high levels of precision. Therefore, SPs with and without performing arts experience can be recruited for high-stakes SP-based clinical skills examinations without sacrificing examination integrity or scoring accuracy.

Subject(s)

Art , Checklist , Medical History Taking , Patient Simulation , Physical Examination , Adult , Aged , Female , Humans , Male , Middle Aged

16.

Collecting evidence of content validity for the international foundations of medicine examination: an expert-based judgmental approach.

De Champlain, André F; Grabovsky, Irina; Scoles, Peter V; Pannizzo, Lorean; Winward, Marcia; Dermine, Annick; Himpens, Bernard.

Teach Learn Med ; 23(2): 144-7, 2011 Apr.

Article in English | MEDLINE | ID: mdl-21516601

Subject(s)

Education, Medical/standards , Educational Measurement/standards , International Cooperation , Belgium , Educational Measurement/methods , Humans

17.

Progress testing in clinical science education: results of a pilot project between the National Board of Medical Examiners and a US Medical School.

De Champlain, Andre F; Cuddy, Monica M; Scoles, Peter V; Brown, Marie; Swanson, David B; Holtzman, Kathleen; Butler, Aggie.

Med Teach ; 32(6): 503-8, 2010.

Article in English | MEDLINE | ID: mdl-20515382

ABSTRACT

BACKGROUND: Though progress tests have been used for several decades in various medical education settings, a few studies have offered analytic frameworks that could be used by practitioners to model growth of knowledge as a function of curricular and other variables of interest. AIM: To explore the use of one form of progress testing in clinical education by modeling growth of knowledge in various disciplines as well as by assessing the impact of recent training (core rotation order) on performance using hierarchical linear modeling (HLM) and analysis of variance (ANOVA) frameworks. METHODS: This study included performances across four test administrations occurring between July 2006 and July 2007 for 130 students from a US medical school who graduated in 2008. Measures-nested-in-examinees HLM growth curve analyses were run to estimate clinical science knowledge growth over time and repeated measures ANOVAs were run to assess the effect of recent training on performance. RESULTS: Core rotation order was related to growth rates for total and pediatrics scores only. Additionally, scores were higher in a given discipline if training had occurred immediately prior to the test administration. CONCLUSIONS: This study provides a useful progress testing framework for assessing medical students' growth of knowledge across their clinical science education and the related impact of training.

Subject(s)

Clinical Medicine/education , Educational Measurement/methods , Schools, Medical , Clinical Clerkship , Pilot Projects , United States

18.

Setting and maintaining standards in multiple-choice examinations: guide supplement 37.2 - viewpoint.

De Champlain, Andre F.

Med Teach ; 32(5): 436-7, 2010.

Article in English | MEDLINE | ID: mdl-20423267

Subject(s)

Choice Behavior , Educational Measurement/standards , Education, Medical, Undergraduate , Guidelines as Topic , Humans

19.

A primer on classical test theory and item response theory for assessments in medical education.

De Champlain, André F.

Med Educ ; 44(1): 109-17, 2010 Jan.

Article in English | MEDLINE | ID: mdl-20078762

ABSTRACT

CONTEXT: A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. OBJECTIVES: The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. METHODS: The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. DISCUSSION: Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.

Subject(s)

Education, Medical/methods , Educational Measurement/methods , Models, Educational , Computer-Assisted Instruction/methods , Humans , Models, Statistical , Psychometrics

20.

Gathering evidence of external validity for the Foundations of Medicine examination: a collaboration between the National Board of Medical Examiners and the University of Minho.

Winward, Marcia L; De Champlain, André F; Grabovsky, Irina; Scoles, Peter V; Swanson, David B; Holtzman, Kathleen Z; Pannizzo, Lorena; Sousa, Nuno; Costa, Manuel J.

Acad Med ; 84(10 Suppl): S116-9, 2009 Oct.

Article in English | MEDLINE | ID: mdl-19907371

ABSTRACT

BACKGROUND: To gather evidence of external validity for the Foundations of Medicine (FOM) examination by assessing the relationship between its subscores and local grades for a sample of Portuguese medical students. METHOD: Correlations were computed between six FOM subscores and nine Minho University grades for a sample of 90 medical students. A canonical correlation analysis was run between FOM and Minho measures. RESULTS: Moderate correlations were noted between FOM subscores and Minho grades, ranging from -0.02 to 0.53. One canonical correlation was statistically significant. The FOM variate accounted for 44% of variance in FOM subscores and 22% of variance in Minho end-of-year grades. The Minho canonical variate accounted for 34% of variance in Minho grades and 17% of the FOM subscore variances. CONCLUSIONS: The FOM examination seems to supplement local assessments by targeting constructs not currently measured. Therefore, it may contribute to a more comprehensive assessment of basic and clinical sciences knowledge.

Subject(s)

Education, Medical , Educational Measurement , Portugal , Reproducibility of Results , Universities

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL