Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-37665413

ABSTRACT

Recent advances in automated scoring technology have made it practical to replace multiple-choice questions (MCQs) with short-answer questions (SAQs) in large-scale, high-stakes assessments. However, most previous research comparing these formats has used small examinee samples testing under low-stakes conditions. Additionally, previous studies have not reported on the time required to respond to the two item types. This study compares the difficulty, discrimination, and time requirements for the two formats when examinees responded as part of a large-scale, high-stakes assessment. Seventy-one MCQs were converted to SAQs. These matched items were randomly assigned to examinees completing a high-stakes assessment of internal medicine. No examinee saw the same item in both formats. Items administered in the SAQ format were generally more difficult than items in the MCQ format. The discrimination index for SAQs was modestly higher than that for MCQs and response times were substantially higher for SAQs. These results support the interchangeability of MCQs and SAQs. When it is important that the examinee generate the response rather than selecting it, SAQs may be preferred. The results relating to difficulty and discrimination reported in this paper are consistent with those of previous studies. The results on the relative time requirements for the two formats suggest that with a fixed testing time fewer SAQs can be administered, this limitation more than makes up for the higher discrimination that has been reported for SAQs. We additionally examine the extent to which increased difficulty may directly impact the discrimination of SAQs.

2.
Adv Health Sci Educ Theory Pract ; 27(5): 1401-1422, 2022 Dec.
Article in English | MEDLINE | ID: mdl-35511357

ABSTRACT

Understanding the response process used by test takers when responding to multiple-choice questions (MCQs) is particularly important in evaluating the validity of score interpretations. Previous authors have recommended eye-tracking technology as a useful approach for collecting data on the processes test taker's use to respond to test questions. This study proposes a new method for evaluating alternative score interpretations by using eye-tracking data and machine learning. We collect eye-tracking data from 26 students responding to clinical MCQs. Analysis is performed by providing 119 eye-tracking features as input for a machine-learning model aiming to classify correct and incorrect responses. The predictive power of various combinations of features within the model is evaluated to understand how different feature interactions contribute to the predictions. The emerging eye-movement patterns indicate that incorrect responses are associated with working from the options to the stem. By contrast, correct responses are associated with working from the stem to the options, spending more time on reading the problem carefully, and a more decisive selection of a response option. The results suggest that the behaviours associated with correct responses are aligned with the real-world model used for score interpretation, while those associated with incorrect responses are not. To the best of our knowledge, this is the first study to perform data-driven, machine-learning experiments with eye-tracking data for the purpose of evaluating score interpretation validity.


Subject(s)
Eye Movements , Eye-Tracking Technology , Humans , Machine Learning , Students
3.
Eval Health Prof ; 43(3): 149-158, 2020 09.
Article in English | MEDLINE | ID: mdl-31462073

ABSTRACT

Learners and educators in the health professions have called for more fine-grained information (subscores) from assessments, beyond a single overall test score. However, due to concerns over reliability, there have been limited uses of subscores in practice. Recent advances in latent class analysis have made contributions in subscore reporting by using diagnostic classification models (DCMs), which allow reliable classification of examinees into fine-grained proficiency levels (subscore profiles). This study examines the innovative and practical application of DCM framework to health professions educational assessments using retrospective large-scale assessment data from the basic and clinical sciences: National Board of Medical Examiners Subject Examinations in pathology (n = 2,006) and medicine (n = 2,351). DCMs were fit and analyzed to generate subscores and subscore profiles of examinees. Model fit indices, classification (reliability), and parameter estimates indicated that DCMs had good psychometric properties including consistent classification of examinees into subscore profiles. Results showed a range of useful information including varying levels of subscore distributions. The DCM framework can be a promising approach to report subscores in health professions education. Consistency of classification was high, demonstrating reliable results at fine-grained subscore levels, allowing for targeted and specific feedback to learners.


Subject(s)
Education, Medical/organization & administration , Educational Measurement/methods , Education, Medical/standards , Humans , Latent Class Analysis , Psychometrics , Reproducibility of Results , Retrospective Studies
4.
Med Teach ; 40(8): 838-841, 2018 08.
Article in English | MEDLINE | ID: mdl-30096987

ABSTRACT

PURPOSE: Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) "crowdsourced" from medical learners could meet the standards of many large-scale testing programs. METHODS: Users of a medical education app (Osmosis.org, Baltimore, MD) volunteered to submit case-based MCQs. Eleven volunteers were selected to submit MCQs targeted to second year medical students. Two hundred MCQs were subjected to duplicate review by a panel of internal medicine faculty who rated each item for relevance, content accuracy, and quality of response option explanations. A sample of 121 items was pretested on clinical subject exams completed by a national sample of U.S. medical students. RESULTS: Seventy-eight percent of the 200 MCQs met faculty reviewer standards based on relevance, accuracy, and quality of explanations. Of the 121 pretested MCQs, 50% met acceptable statistical criteria. The most common reasons for exclusion were that the item was too easy or had a low discrimination index. CONCLUSIONS: Crowdsourcing can efficiently yield high-quality assessment items that meet rigorous judgmental and statistical criteria. Similar models may be adopted by students and educators to augment item pools that support adaptive learning.


Subject(s)
Education, Medical, Undergraduate/methods , Educational Measurement/methods , Formative Feedback , Crowdsourcing , Educational Measurement/standards , Humans , Learning , Mobile Applications , Students, Medical
5.
Anat Sci Educ ; 8(1): 12-20, 2015.
Article in English | MEDLINE | ID: mdl-24678042

ABSTRACT

Anatomical education is a dynamic field where developments in the implementation of constructive, situated-learning show promise in improving student achievement. The purpose of this study was to examine the effectiveness of an individualized, technology heavy project in promoting student performance in a combined anatomy and physiology laboratory course. Mixed-methods research was used to compare two cohorts of anatomy laboratories separated by the adoption of a new laboratory atlas project, which were defined as preceding (PRE) and following the adoption of the Anatomical Teaching and Learning Assessment Study (ATLAS; POST). The ATLAS project required the creation of a student-generated, photographic atlas via acquisition of specimen images taken with tablet technology and digital microscope cameras throughout the semester. Images were transferred to laptops, digitally labeled and photo edited weekly, and compiled into a digital book using Internet publishing freeware for final project submission. An analysis of covariance confirmed that student final examination scores were improved (P < 0.05) following the implementation of the laboratory atlas project (PRE, n = 75; POST, n = 90; means ± SE; 74.9 ± 0.9 versus 78.1 ± 0.8, respectively) after controlling for cumulative student grade point average. Analysis of questionnaires collected (n = 68) from the post group suggested students identified with atlas objectives, appreciated the comprehensive value in final examination preparation, and the constructionism involved, but recommended alterations in assignment logistics and the format of the final version. Constructionist, comprehensive term-projects utilizing student-preferred technologies could be used to improve performance toward student learning outcomes.


Subject(s)
Anatomy/education , Computer-Assisted Instruction/methods , Adolescent , Computer Graphics , Computer-Assisted Instruction/instrumentation , Computers, Handheld , Curriculum , Educational Measurement , Humans , Image Processing, Computer-Assisted , Learning , Microscopy , Prospective Studies , Surveys and Questionnaires , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...