RÉSUMÉ
The passing rate of the Medical Licensing Examination has been variable, which probably originated from the difference in the difficulty of items and/or difference in the ability level of examinees. We tried to explain the origin of the difference using the test equating method based on the item response theory. The number of items and examinees were 500, 3,647 in 2003 and 550, 3,879 in 2004. Common item nonequivalent group design was used for 30 common items. Item and ability parameters were calculated by three parametric logistic models using ICL. Scale transformation and true score equating were executed using ST and PIE. The mean of difficulty index of the year 2003 was -0.957 (SD 2.628) and that of 2004 after equating was -1.456 (SD 3.399). The mean of discrimination index of year 2003 was 0.487 (SD 0.242) and that of 2004 was 0.363 (SD 0.193). The mean of ability parameter of year 2003 was 0.00617 (SD 0.96605) and that of year 2004 was 0.94636 (SD 1.32960). The difference of the equated true score at the same ability level was high at the range of score of 200-350. The reason for the difference in passing rates over two consecutive years was due to the fact that the Examination in 2004 was easier and the abilities of the examinees in 2004 were higher. In addition, the passing rates of examinees with score of 270-294 in 2003, and those with 322-343 in 2004, were affected by the examination year.
Sujet(s)
4252 , Enseignement médical , Autorisation d'exercer , Modèles logistiquesRÉSUMÉ
PURPOSE: To determine possible differences in the ability of students from two consecutive medical school class years, test equating based on item response theory was performed on the results of a lecture examination. METHODS: The dataset for this study of the results of a medical school lecture exam was composed of the number of items and examinees: 75 students tested on 60 items in 2002, and 82 students tested on 50 items in 2003. We used common item non-equivalent group design and tested the assumption of unidimensionality and data-fitness. Item parameter estimates were based on the Rasch model, and the scale transformation was performed using mean/sigma moment methods. Equating was applied to both true and observed scores, and the results were analyzed. RESULTS: The slope and intercept for the scale transformation of the 2003 exam to the 2002 exam were 1.2796 and 0.9630, respectively. The mean and standard deviation of the ability parameter of the students taking the 2003 exam were 1.69 and 0.66, respectively. After scale transformation, the mean increased to 3.13 and the standard deviation increased to 0.85. For the 2002 exam, the mean was 0.70 and the standard deviation was 0.66. The correlation coefficient between the true and observed scores was 0.9994 (P=0.00). CONCLUSION: The difference in the ability parameter of students between the two years increased after scale transformation. Both the true and observed scores could be used interchangeably after equating. Thus, test equating based on item response theory may be applicable to medical school lecture examinations.
Sujet(s)
Humains , Ensemble de données , Enseignement médical , Écoles de médecineRÉSUMÉ
In order to compare a group of examinees to other group, the tests taken by two groups of examinees should be equivalent. The first step of the equating the test is to make anchor items. In medical schools, the students prepare the examination through the thorough review of the test items of previous year. So it is said that the reuse of the same items could be undesirable. The purpose of this study is to find out response variations to the same items between two consecutive grades when the items are reused. The senior classes of a medical school are sampled and the test items of graduation examination was analysed. On the basis of item difficulty(item's p-value) and discrimination index, we selected 35 items. Next year, we reused those items to the same examination to the senior class of that year, and the result was analysed. Among those 35 items, 14 items were modified slightly. The averages of the item difficulty and discrimination index on the previous examination were 0.49 and 0.20 which were in the desirable ranges. But at the next year these data were worsened to 0.84 and 0.10 respectively. This trends were not different in the slightly modified items. And there was no significant differences among the item groups classified on the basis of the level of knowledge. We could ascertain that if a previously used item used again to a test, the item difficulty will increase(become easy) and discrimination index will decrease even though minor modification is done.