Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Artigo em Inglês | MEDLINE | ID: mdl-26883811

RESUMO

PURPOSE: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. METHODS: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. RESULTS: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). CONCLUSION: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.


Assuntos
Avaliação Educacional/normas , Licenciamento em Medicina/normas , Calibragem , Canadá , Comportamento de Escolha , Humanos , Modelos Teóricos
3.
Eval Health Prof ; 39(1): 100-13, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26377072

RESUMO

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised machine learning techniques were employed for developing the scoring models. In Stage 3, responses to six English and French CDM questions were scored using the scoring models from Stage 2. Of the 8,007 English and French CDM responses, 7,643 were accurately scored with an agreement rate of 95.4% between human and computer scoring. This result serves as an improvement of 5.4% when compared with the human inter-rater reliability. Our framework yielded scores similar to those of expert physician markers and could be used for clinical competency assessment.


Assuntos
Competência Clínica , Avaliação Educacional/métodos , Avaliação Educacional/normas , Processamento Eletrônico de Dados/normas , Tradução , Canadá , Tomada de Decisão Clínica , Humanos , Licenciamento em Medicina , Reprodutibilidade dos Testes
4.
Adv Health Sci Educ Theory Pract ; 20(3): 581-94, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25164266

RESUMO

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada's National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.


Assuntos
Viés , Competência Clínica , Avaliação Educacional/normas , Médicos , Competência Clínica/estatística & dados numéricos , Feminino , Humanos , Masculino , Modelos Estatísticos
5.
Simul Healthc ; 6(3): 150-4, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21646984

RESUMO

INTRODUCTION: : It is not known whether a Standardized Patient's (SP's) performing arts background could affect his or her accuracy in recording candidate performance on a high-stakes clinical skills examination, such as the Comprehensive Osteopathic Medical Licensing Examination Level 2 Performance Evaluation. The purpose of this study is to investigate the differences in recording accuracy of history and physical checklist items between SPs who identify themselves as performing artists and SPs with no performance arts experience. METHODS: : Forty SPs identified themselves as being performing artists or nonperforming artists. A sample of SP live examination ratings were compared with a second set of ratings obtained after video review (N = 1972 SP encounters) over 40 cases from the 2008-2009 testing cycle. Differences in SP checklist recording accuracy were tested as a function of performing arts experience. RESULTS: : Mean overall agreement rates, both uncorrected and corrected for chance agreement, were very high (0.94 and 0.79, respectively, at the overall examination level). There was no statistically significant difference between the two groups with respect to any of the mean accuracy measures: history taking (z = -0.422, P = 0.678), physical examination (z = -1.453, P = 0.072), and overall data gathering (z = -0.812, P = 0.417) checklist items. CONCLUSION: : Results suggest that SPs with or without a performing arts background complete history taking and physical examination checklist items with high levels of precision. Therefore, SPs with and without performing arts experience can be recruited for high-stakes SP-based clinical skills examinations without sacrificing examination integrity or scoring accuracy.


Assuntos
Arte , Lista de Checagem , Anamnese , Simulação de Paciente , Exame Físico , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
7.
Med Teach ; 32(6): 503-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20515382

RESUMO

BACKGROUND: Though progress tests have been used for several decades in various medical education settings, a few studies have offered analytic frameworks that could be used by practitioners to model growth of knowledge as a function of curricular and other variables of interest. AIM: To explore the use of one form of progress testing in clinical education by modeling growth of knowledge in various disciplines as well as by assessing the impact of recent training (core rotation order) on performance using hierarchical linear modeling (HLM) and analysis of variance (ANOVA) frameworks. METHODS: This study included performances across four test administrations occurring between July 2006 and July 2007 for 130 students from a US medical school who graduated in 2008. Measures-nested-in-examinees HLM growth curve analyses were run to estimate clinical science knowledge growth over time and repeated measures ANOVAs were run to assess the effect of recent training on performance. RESULTS: Core rotation order was related to growth rates for total and pediatrics scores only. Additionally, scores were higher in a given discipline if training had occurred immediately prior to the test administration. CONCLUSIONS: This study provides a useful progress testing framework for assessing medical students' growth of knowledge across their clinical science education and the related impact of training.


Assuntos
Medicina Clínica/educação , Avaliação Educacional/métodos , Faculdades de Medicina , Estágio Clínico , Projetos Piloto , Estados Unidos
9.
Med Educ ; 44(1): 109-17, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20078762

RESUMO

CONTEXT: A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. OBJECTIVES: The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. METHODS: The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. DISCUSSION: Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.


Assuntos
Educação Médica/métodos , Avaliação Educacional/métodos , Modelos Educacionais , Instrução por Computador/métodos , Humanos , Modelos Estatísticos , Psicometria
10.
Acad Med ; 84(10 Suppl): S116-9, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19907371

RESUMO

BACKGROUND: To gather evidence of external validity for the Foundations of Medicine (FOM) examination by assessing the relationship between its subscores and local grades for a sample of Portuguese medical students. METHOD: Correlations were computed between six FOM subscores and nine Minho University grades for a sample of 90 medical students. A canonical correlation analysis was run between FOM and Minho measures. RESULTS: Moderate correlations were noted between FOM subscores and Minho grades, ranging from -0.02 to 0.53. One canonical correlation was statistically significant. The FOM variate accounted for 44% of variance in FOM subscores and 22% of variance in Minho end-of-year grades. The Minho canonical variate accounted for 34% of variance in Minho grades and 17% of the FOM subscore variances. CONCLUSIONS: The FOM examination seems to supplement local assessments by targeting constructs not currently measured. Therefore, it may contribute to a more comprehensive assessment of basic and clinical sciences knowledge.


Assuntos
Educação Médica , Avaliação Educacional , Portugal , Reprodutibilidade dos Testes , Universidades
11.
Teach Learn Med ; 17(1): 14-20, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15691809

RESUMO

BACKGROUND: The Ministry of Health of the Republic of Panama is currently developing a national examination system that will be used to license graduates to practice medicine in that country, as well as to undertake postgraduate medical training. As part of these efforts, a preliminary project was undertaken between the National Board of Medical Examiners (NBME) and the Faculty of Medicine of the University of Panama to develop a Residency Selection Process Examination (RSPE). PURPOSE: The purpose of this study was to assess the reliability and validity of RSPE scores for a sample of candidates who wished to obtain a residency slot in Panama. METHODS: The RSPE, composed of 200 basic and clinical sciences multiple-choice items, was administered to 261 residency applicants at the University of Panama. RESULTS: The reliability estimate computed was comparable with that reported with other high-stakes examinations (Cronbach's alpha = 0.89). Also, a Rasch examinee proficiency item difficulty plot showed that the RSPE was well targeted to the proficiency levels of candidates. Finally, a moderate correlation was noted between local grade point averages and RSPE scores for University of Panama students (r = 0.38). CONCLUSIONS: Findings suggest that it is possible to translate and adapt test materials for use in other contexts.


Assuntos
Internato e Residência , Critérios de Admissão Escolar , Faculdades de Medicina/organização & administração , Licenciamento , Panamá
12.
Acad Med ; 79(10 Suppl): S12-4, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15383377

RESUMO

PURPOSE: The purpose of this study was to assess whether the interaction of examinee and standardized patient (SP) ethnicity has an impact on data gathering and written communication scores in a large-scale clinical skills assessment used for certification purposes. METHOD: The sample that was the focus of the present investigation was selected from the population of 9,551 international medical graduates (IMGs) who completed the Educational Commission for Foreign Medical Graduates' Clinical Skills Assessment between May 1, 2002, and May 31, 2003. Analyses of covariance were undertaken separately for four cases, adjusting for initial mean differences between candidate groups and controlling for stringency levels of SPs. Over 2,800 SP-IMG encounters were analyzed, ranging from 597 (Case 2) to 915 (Case 3). RESULTS: None of the SP ethnicity/examinee ethnicity interactions were statistically significant. CONCLUSIONS: Findings suggest that there is little advantage to be gained by encountering a SP of similar ethnic makeup. These results are discussed in light of past research undertaken to assess fairness issues with clinical skills examinations.


Assuntos
Competência Clínica , Avaliação Educacional/métodos , Etnicidade , Relações Médico-Paciente , População Negra , Certificação/normas , Competência Clínica/normas , Comunicação , Coleta de Dados , Avaliação Educacional/normas , Médicos Graduados Estrangeiros/normas , Hispânico ou Latino , Humanos , População Branca , Redação
13.
J Vet Med Educ ; 31(1): 61-5, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15962251

RESUMO

Determining whether or not an examinee has met an adequate standard of performance constitutes a central task for licensure and certification bodies. Consequently, standard setting is a key activity for all certification and licensing testing programs. The purpose of this article is to provide an overview of methods that have been proposed for set a passing standard on an examination. First, the distinction between norm-referenced and criterion-referenced methods for setting a standard will be outlined. Then, both test-centered and examinee-centered methods for setting a passing standard will be explicated. The importance of factoring in the consequences of adopting a standard will also be illustrated via the Hofstee method. In the concluding section, important issues pertaining to the selection of panelists as well as the validation of the standard will be addressed briefly.


Assuntos
Educação em Veterinária , Avaliação Educacional/normas , Animais , Humanos
14.
Ann Med Interne (Paris) ; 154(3): 148-56, 2003 May.
Artigo em Francês | MEDLINE | ID: mdl-12910041

RESUMO

Medical training is undergoing extensive revision in France. A nationwide comprehensive clinical competency examination will be administered for the first time in 2004, relying exclusively on essay-questions. Unfortunately, these questions have psychometric shortcomings, particularly their typically low reliability. High score reliability is mandatory in a high-stakes context. The National Board of Medical Examiners-designed multiple choice-questions (MCQ) are well adapted to assess clinical competency with a high reliability score. The purpose of this study was to test the hypothesis that French medical students could take an American-designed and French-adapted comprehensive clinical knowledge examination with this MCQ format. Two hundred and eighty five French students, from four Medical Schools across France, took an examination composed of 200 MCQs under standardized conditions. Their scores were compared with those of American students. This examination was found assess French students' clinical knowledge with a high level of reliability. French students' scores were slightly lower than those of American students, mostly due to a lack of familiarity with this particular item format, and a lower motivational level. Another study is being designed, with a larger group, to address some of the shortcomings of the initial study. If these preliminary results are replicated, the MCQ format might be a more defendable and sensible alternative to the proposed essay questions.


Assuntos
Competência Clínica/normas , Educação Médica/normas , Avaliação Educacional , Licenciamento em Medicina/normas , Adulto , Feminino , França , Humanos , Masculino , Projetos Piloto , Reprodutibilidade dos Testes , Estados Unidos
15.
Med Teach ; 25(3): 245-9, 2003 May.
Artigo em Inglês | MEDLINE | ID: mdl-12881044

RESUMO

Recently, standardized patient assessments and objective structured clinical examinations have been used for high-stakes certification and licensure decisions. In these testing situations, it is important that the assessments are standardized, the scores are accurate and reliable, and the resulting decisions regarding competence ar equitable and defensible. For the decisions to be valid, justifiable standards, or cut-scores, must beset. Unfortunately, unlike the body of research specifically dedicated to multiple-choice examinations, relatively little research has been conducted on standard-setting methods appropriate for use with performance-based assessments. The purpose of this article is to provide the reader with some guidance on how to set defensible standards on performance assessments, especially those that utilize standardized patients in simulated medical encounters. Various methods are discussed and contrasted, highlighting the relevant strengths and weaknesses. In addition, based on the prevailing literature and research, ideas for future studies and potential augmentations to current performance-based standard setting protocols are advanced.


Assuntos
Competência Clínica/normas , Exame Físico/normas , Avaliação de Processos em Cuidados de Saúde/métodos , Padrões de Referência , Certificação , Avaliação Educacional , Pesquisa sobre Serviços de Saúde , Humanos , Licenciamento em Medicina , Simulação de Paciente , Estados Unidos
16.
Acad Med ; 78(5): 509-17, 2003 May.
Artigo em Inglês | MEDLINE | ID: mdl-12742789

RESUMO

PURPOSE: The French government, as part of medical education reforms, has affirmed that an examination program for national residency selection will be implemented by 2004. The purpose of this study was to develop a French multiple-choice (MC) examination using the National Board of Medical Examiners' (NBME) expertise and materials. METHOD: The Evaluation Standardisée du Second Cycle (ESSC), a four-hour clinical sciences examination, was administered in January 2002 to 285 medical students at four university test sites in France. The ESSC had 200 translated and adapted MC items selected from the Comprehensive Clinical Sciences Examination (CCSE), an NBME subject test. RESULTS: Less than 10% of the ESSC items were rejected as inappropriate to French practice. Also, the distributions of ESSC item characteristics were similar to those reported with the CCSE. The ESSC also appeared to be very well targeted to examinees' proficiencies and yielded a reliability coefficient of.91. However, because of a higher word count, the ESSC did show evidence of speededness. Regarding overall performance, the mean proficiency estimate for French examinees was about 0.4 SD below that of a CCSE population. CONCLUSIONS: This study provides strong evidence for the usefulness of the model adopted in this first collaborative effort between the NBME and a consortium of French medical schools. Overall, the performance of French students was comparable to that of CCSE students, which was encouraging given the differences in motivation and the speeded nature of the French test. A second phase with the participation of larger numbers of French medical schools and students is being planned.


Assuntos
Medicina Clínica/educação , Avaliação Educacional , Faculdades de Medicina , Estudantes de Medicina , Feminino , França , Humanos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...