Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Pak J Med Sci ; 40(6): 1241-1246, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38952493

RESUMO

Objective: To explore content experts' experiences with item vetting during item bank development at a public sector medical university of Rawalpindi, Pakistan. Methods: An exploratory study was carried out from December 2022 to February 2023 at a public sector medical college of Rawalpindi. A purposive sampling technique was employed to collect data from all content experts of the study institute who participated in item vetting activity during pre-exam moderation in the university. A pilot-tested semi-structured interview guide was utilized, interviews were audio recorded and later transcribed. Participants' anonymity was ensured. Various quality assurance strategies were employed to ensure the trustworthiness of the findings. Thematic analysis was performed on the transcribed data and themes were finalized by achieving consensus among all authors. Results: Six themes overarching the fourteen subthemes emerged from the data. Participants expressed a profound sense of satisfaction and valued their experience in refining expertise in constructing multiple-choice questions (MCQs). It was widely acknowledged that such activities not only contribute to the enhancement of item development skills but also improve quality of items. Conclusions: The consistent implementation of item vetting routines, in conjunction with diligent adherence to item writing protocols, contributes to quality assurance measures in assessment. Item bank development for fair and transparent assessment ensures production of competent healthcare professionals filtering incompetent ones hence improving health care services in the community.

2.
Radiol Bras ; 57: e20230083, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38993961

RESUMO

Objective: To test the performance of ChatGPT on radiology questions formulated by the Colégio Brasileiro de Radiologia (CBR, Brazilian College of Radiology), evaluating its failures and successes. Materials and Methods: 165 questions from the CBR annual resident assessment (2018, 2019, and 2022) were presented to ChatGPT. For statistical analysis, the questions were divided by the type of cognitive skills assessed (lower or higher order), by topic (physics or clinical), by subspecialty, by style (description of a clinical finding or sign, clinical management of a case, application of a concept, calculation/classification of findings, correlations between diseases, or anatomy), and by target academic year (all, second/third year, or third year only). Results: ChatGPT answered 88 (53.3%) of the questions correctly. It performed significantly better on the questions assessing lower-order cognitive skills than on those assessing higher-order cognitive skills, providing the correct answer on 38 (64.4%) of 59 questions and on only 50 (47.2%) of 106 questions, respectively (p = 0.01). The accuracy rate was significantly higher for physics questions than for clinical questions, correct answers being provided for 18 (90.0%) of 20 physics questions and for 70 (48.3%) of 145 clinical questions (p = 0.02). There was no significant difference in performance among the subspecialties or among the academic years (p > 0.05). Conclusion: Even without dedicated training in this field, ChatGPT demonstrates reasonable performance, albeit still insufficient for approval, on radiology questions formulated by the CBR.


Objetivo: Testar o desempenho do ChatGPT em questões de radiologia formuladas pelo Colégio Brasileiro de Radiologia (CBR), avaliando seus erros e acertos. Materiais e Métodos: 165 questões da avaliação anual dos residentes do CBR (2018, 2019 e 2022) foram apresentadas ao ChatGPT. Elas foram divididas, para análise estatística, em questões que avaliavam habilidades cognitivas de ordem superior ou inferior e de acordo com a subespecialidade, o tipo da questão (descrição de um achado clínico ou sinal, manejo clínico de um doente, aplicação de um conceito, cálculo ou classificação dos achados descritos, associação entre doenças ou anatomia) e o ano da residência (R1, R2 ou R3). Resultados: O ChatGPT acertou 53,3% das questões (88/165). Houve diferença estatística entre o desempenho em questões de ordem cognitiva inferior (64,4%; 38/59) e superior (47,2%; 50/106) (p = 0,01). Houve maior índice de acertos em física (90,0%; 18/20) do que em questões clínicas (48,3%; 70/145) (p = 0,02). Não houve diferença significativa de desempenho entre subespecialidades ou ano de residência (p > 0,05). Conclusão: Mesmo sem treinamento dedicado a essa área, o ChatGPT apresenta desempenho razoável, mas ainda insuficiente para aprovação, em questões de radiologia formuladas pelo CBR.

3.
Am J Pharm Educ ; 88(4): 100684, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38479646

RESUMO

OBJECTIVE: To describe an evaluation of a generative language model tool to write examination questions for a new elective course focused on the interpretation of common clinical laboratory results being developed as an elective for students in a Bachelor of Science in Pharmaceutical Sciences program. METHODS: A total of 100 multiple-choice questions were generated using a publicly available large language model for a course dealing with common laboratory values. Two independent evaluators with extensive training and experience in writing multiple-choice questions evaluated each question for appropriate formatting, clarity, correctness, relevancy, and difficulty. For each question, a final dichotomous judgment was assigned by each reviewer, usable as written or not usable written. RESULTS: The major finding of this study was that a generative language model (ChatGPT 3.5) could generate multiple-choice questions for assessing common laboratory value information but only about half the questions (50% and 57% for the 2 evaluators) were deemed usable without modification. General agreement between evaluator comments was common (62% of comments) with more than 1 correct answer being the most common reason for commenting on the lack of usability (N = 27). CONCLUSION: The generally positive findings of this study suggest that the use of a generative language model tool for developing examination questions is deserving of further investigation.


Assuntos
Educação em Farmácia , Humanos , Julgamento , Laboratórios , Idioma , Redação
4.
Eur J Investig Health Psychol Educ ; 14(3): 657-668, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38534904

RESUMO

(1) Background: As the field of artificial intelligence (AI) evolves, tools like ChatGPT are increasingly integrated into various domains of medicine, including medical education and research. Given the critical nature of medicine, it is of paramount importance that AI tools offer a high degree of reliability in the information they provide. (2) Methods: A total of n = 450 medical examination questions were manually entered into ChatGPT thrice, each for ChatGPT 3.5 and ChatGPT 4. The responses were collected, and their accuracy and consistency were statistically analyzed throughout the series of entries. (3) Results: ChatGPT 4 displayed a statistically significantly improved accuracy with 85.7% compared to that of 57.7% of ChatGPT 3.5 (p < 0.001). Furthermore, ChatGPT 4 was more consistent, correctly answering 77.8% across all rounds, a significant increase from the 44.9% observed from ChatGPT 3.5 (p < 0.001). (4) Conclusions: The findings underscore the increased accuracy and dependability of ChatGPT 4 in the context of medical education and potential clinical decision making. Nonetheless, the research emphasizes the indispensable nature of human-delivered healthcare and the vital role of continuous assessment in leveraging AI in medicine.

5.
Int Dent J ; 74(3): 616-621, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38242810

RESUMO

OBJECTIVES: Generative artificial intelligence (GenAI), including large language models (LLMs), has vast potential applications in health care and education. However, it is unclear how proficient LLMs are in interpreting written input and providing accurate answers in dentistry. This study aims to investigate the accuracy of GenAI in answering questions from dental licensing examinations. METHODS: A total of 1461 multiple-choice questions from question books for the US and the UK dental licensing examinations were input into 2 versions of ChatGPT 3.5 and 4.0. The passing rates of the US and UK dental examinations were 75.0% and 50.0%, respectively. The performance of the 2 versions of GenAI in individual examinations and dental subjects was analysed and compared. RESULTS: ChatGPT 3.5 correctly answered 68.3% (n = 509) and 43.3% (n = 296) of questions from the US and UK dental licensing examinations, respectively. The scores for ChatGPT 4.0 were 80.7% (n = 601) and 62.7% (n = 429), respectively. ChatGPT 4.0 passed both written dental licensing examinations, whilst ChatGPT 3.5 failed. ChatGPT 4.0 answered 327 more questions correctly and 102 incorrectly compared to ChatGPT 3.5 when comparing the 2 versions. CONCLUSIONS: The newer version of GenAI has shown good proficiency in answering multiple-choice questions from dental licensing examinations. Whilst the more recent version of GenAI generally performed better, this observation may not hold true in all scenarios, and further improvements are necessary. The use of GenAI in dentistry will have significant implications for dentist-patient communication and the training of dental professionals.


Assuntos
Inteligência Artificial , Avaliação Educacional , Licenciamento em Odontologia , Humanos , Avaliação Educacional/métodos , Estados Unidos , Reino Unido
6.
Radiol. bras ; 57: e20230083, 2024. tab, graf
Artigo em Inglês | LILACS-Express | LILACS | ID: biblio-1558821

RESUMO

Abstract Objective: To test the performance of ChatGPT on radiology questions formulated by the Colégio Brasileiro de Radiologia (CBR, Brazilian College of Radiology), evaluating its failures and successes. Materials and Methods: 165 questions from the CBR annual resident assessment (2018, 2019, and 2022) were presented to ChatGPT. For statistical analysis, the questions were divided by the type of cognitive skills assessed (lower or higher order), by topic (physics or clinical), by subspecialty, by style (description of a clinical finding or sign, clinical management of a case, application of a concept, calculation/classification of findings, correlations between diseases, or anatomy), and by target academic year (all, second/third year, or third year only). Results: ChatGPT answered 88 (53.3%) of the questions correctly. It performed significantly better on the questions assessing lower-order cognitive skills than on those assessing higher-order cognitive skills, providing the correct answer on 38 (64.4%) of 59 questions and on only 50 (47.2%) of 106 questions, respectively (p = 0.01). The accuracy rate was significantly higher for physics questions than for clinical questions, correct answers being provided for 18 (90.0%) of 20 physics questions and for 70 (48.3%) of 145 clinical questions (p = 0.02). There was no significant difference in performance among the subspecialties or among the academic years (p > 0.05). Conclusion: Even without dedicated training in this field, ChatGPT demonstrates reasonable performance, albeit still insufficient for approval, on radiology questions formulated by the CBR.


Resumo Objetivo: Testar o desempenho do ChatGPT em questões de radiologia formuladas pelo Colégio Brasileiro de Radiologia (CBR), avaliando seus erros e acertos. Materiais e Métodos: 165 questões da avaliação anual dos residentes do CBR (2018, 2019 e 2022) foram apresentadas ao ChatGPT. Elas foram divididas, para análise estatística, em questões que avaliavam habilidades cognitivas de ordem superior ou inferior e de acordo com a subespecialidade, o tipo da questão (descrição de um achado clínico ou sinal, manejo clínico de um doente, aplicação de um conceito, cálculo ou classificação dos achados descritos, associação entre doenças ou anatomia) e o ano da residência (R1, R2 ou R3). Resultados: O ChatGPT acertou 53,3% das questões (88/165). Houve diferença estatística entre o desempenho em questões de ordem cognitiva inferior (64,4%; 38/59) e superior (47,2%; 50/106) (p = 0,01). Houve maior índice de acertos em física (90,0%; 18/20) do que em questões clínicas (48,3%; 70/145) (p = 0,02). Não houve diferença significativa de desempenho entre subespecialidades ou ano de residência (p > 0,05). Conclusão: Mesmo sem treinamento dedicado a essa área, o ChatGPT apresenta desempenho razoável, mas ainda insuficiente para aprovação, em questões de radiologia formuladas pelo CBR.

7.
Cureus ; 15(6): e40977, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37519497

RESUMO

Background Artificial intelligence (AI) is evolving in the medical education system. ChatGPT, Google Bard, and Microsoft Bing are AI-based models that can solve problems in medical education. However, the applicability of AI to create reasoning-based multiple-choice questions (MCQs) in the field of medical physiology is yet to be explored. Objective We aimed to assess and compare the applicability of ChatGPT, Bard, and Bing in generating reasoning-based MCQs for MBBS (Bachelor of Medicine, Bachelor of Surgery) undergraduate students on the subject of physiology. Methods The National Medical Commission of India has developed an 11-module physiology curriculum with various competencies. Two physiologists independently chose a competency from each module. The third physiologist prompted all three AIs to generate five MCQs for each chosen competency. The two physiologists who provided the competencies rated the MCQs generated by the AIs on a scale of 0-3 for validity, difficulty, and reasoning ability required to answer them. We analyzed the average of the two scores using the Kruskal-Wallis test to compare the distribution across the total and module-wise responses, followed by a post-hoc test for pairwise comparisons. We used Cohen's Kappa (Κ) to assess the agreement in scores between the two raters. We expressed the data as a median with an interquartile range. We determined their statistical significance by a p-value <0.05. Results ChatGPT and Bard generated 110 MCQs for the chosen competencies. However, Bing provided only 100 MCQs as it failed to generate them for two competencies. The validity of the MCQs was rated as 3 (3-3) for ChatGPT, 3 (1.5-3) for Bard, and 3 (1.5-3) for Bing, showing a significant difference (p<0.001) among the models. The difficulty of the MCQs was rated as 1 (0-1) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with a significant difference (p=0.006). The required reasoning ability to answer the MCQs was rated as 1 (1-2) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with no significant difference (p=0.235). K was ≥ 0.8 for all three parameters across all three AI models. Conclusion AI still needs to evolve to generate reasoning-based MCQs in medical physiology. ChatGPT, Bard, and Bing showed certain limitations. Bing generated significantly least valid MCQs, while ChatGPT generated significantly least difficult MCQs.

8.
Int. j. morphol ; 41(2): 355-361, abr. 2023. ilus, tab
Artigo em Inglês | LILACS | ID: biblio-1440322

RESUMO

SUMMARY: Numerous students perceive neuroanatomy as a particularly difficult subject due to its overwhelming complexity. Therefore, a neuroanatomy book that concentrates on easy-to-read stories with schematics rather than exhaustive details has been published. This study evaluates the effect of a trial of the new neuroanatomy book on student learning. From the book, a printout on the brainstem and cranial nerve was extracted. Medical students read the printout, and subsequently were examined on their knowledge of and interest in neuroanatomy. Students who read the extract answered examination questions relatively well and were more interested in neuroanatomy. The printout seemed to enhance the knowledge and concentration of the students. After grasping the fundamental information in the book, students are expected to be able to learn advanced concepts comfortably and confidently. In addition, the book with its concise and simple content is suitable not only for short- duration neuroanatomy courses but also for self-learning.


Muchos estudiantes perciben la neuroanatomía como un tema particularmente difícil debido a su abrumadora complejidad. Por lo tanto, se ha publicado un libro de neuroanatomía que se concentra en historias fáciles de leer con esquemas en lugar de detalles exhaustivos. Este estudio evalúa el efecto de una prueba del nuevo libro de neuroanatomía en el aprendizaje de los estudiantes. Del libro, se extrajo una impresión sobre el tronco encefálico y los nervios craneales. Los estudiantes de medicina leyeron la copia impresa y, posteriormente, se les examinó su conocimiento e interés por la neuroanatomía. Los estudiantes que leyeron el extracto respondieron relativamente bien a las preguntas del examen y estaban más interesados en la neuroanatomía. La impresión parecía mejorar el conocimiento y la concentración de los estudiantes. Después de comprender la información fundamental del libro, se espera que los estudiantes puedan aprender conceptos avanzados con comodidad y confianza. Además, el libro con su contenido conciso y simple es adecuado no solo para cursos de neuroanatomía de corta duración, sino también, para el autoaprendizaje.


Assuntos
Humanos , Masculino , Feminino , Estudantes de Medicina , Livros Ilustrados , Aprendizagem , Neuroanatomia/educação , Inquéritos e Questionários
9.
J Ayub Med Coll Abbottabad ; 34(1): 178-182, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35466649

RESUMO

BACKGROUND: To explore barriers and facilitators to write good quality items for undergraduate dental assessments. METHODS: A qualitative case study was conducted from Feb-April 2021. Semi-structured interviews were conducted with a purposive sample of eighteen item writers from a public-sector dental institute of Rawalpindi, Pakistan. The data were transcribed verbatim and thematically analyzed to extract themes regarding barriers and facilitators to write good quality items. All quality assurance procedures of qualitative research were ensured during the research process.. RESULTS: Five themes related to barriers and three themes related to facilitators to write good quality items emerged from the data. The participants reported more barriers such as lack of frequent training and lack of peer review and feedback. Other barriers were demotivation due to lack of acknowledgement or monetary incentives, lack of content and construct expertise, clinical workload, and contextual barriers such as lack of internet facility, outdated library, and lack of place and time allocation for item construction. Facilitators were availability of peer review, feedback from post-hoc analysis, motivation due to the senior designation, clinical experience, and ample time for basic sciences faculty. CONCLUSIONS: Frequent item writing training, strong peer review process, pre-exam item vetting by the dental education department, and institutional improvements such as striving for content experts, time and place allocation for item construction, internet facility, updated library, and equal distribution of workload among faculty could enhance the quality of items. Moreover, ways to inculcate motivation among item writers such as appreciation or monetary incentives could be used to improve the quality of undergraduate assessments.


Assuntos
Motivação , Redação , Retroalimentação , Humanos , Pesquisa Qualitativa , Projetos de Pesquisa
10.
Rev. bras. educ. méd ; 46(supl.1): e157, 2022. tab
Artigo em Português | LILACS-Express | LILACS | ID: biblio-1407395

RESUMO

Resumo: Introdução: A avaliação do estudante deve induzir aprendizagem e ser baseada em competência, ou seja, avaliar (habilidades cognitivas, psicomotoras e afetivas). Para avaliar conhecimento e a habilidade para sua utilização no contexto profissional, o Teste de Progresso (TP) tem sido usado em larga escala, com finalidade somativa e principalmente formativa. Objetivo: Este estudo teve como objetivo verificar a adequação e qualidade de itens que compõem os TP realizados pelos estudantes. Método: Trata-se de estudo exploratório descritivo e retrospectivo que analisou todos os itens de seis provas do TP aplicado a estudantes de Medicina do primeiro ao sexto ano da Faculdade de Medicina de Ribeirão Preto/USP, no período de 2013 a 2018. Os sete indicadores de boas práticas foram: 1. abordar tema relevante na formação médica; 2. ter enunciado maior que as alternativas; 3. avaliar aplicação do conhecimento; 4. definir pergunta clara para o item no enunciado; 5. avaliar apenas um domínio do conhecimento em cada item; 6. ter resposta correta e distratores homogêneos e plausíveis; 7. ausência de erros no item que acrescentam dificuldade desnecessária ou dão pistas da resposta correta. Dois avaliadores independentes analisaram as questões e, quando necessário, revisavam em conjunto os itens discordantes. Resultado: A análise das provas permitiu identificar boa qualidade técnica na maioria dos itens das seis provas, além de indicar que a não adesão foi mais frequente nos indicadores 4 e 5, que podem comprometer tanto a validade quanto a interpretação dos resultados da prova em termos de lacunas do conhecimento por parte dos estudantes. Conclusão: A qualidade das questões das provas analisadas é muito boa, mas foi possível identificar oportunidades de melhoria no processo de elaboração de itens, que servem de base para o desenvolvimento docente dos elaboradores da instituição.


Abstract: Introduction: Assessment drives learning and should follow a competence-based approach. The Progress Test (PT) has been used on a large scale for summative and mainly formative purposes to assess knowledge and the ability to use it in the professional context. Objective: To check the adequacy and quality of the items and that make up the progress tests sat by students. Method: Descriptive and retrospective exploratory study that analyzed all the items of six PT exams applied to medical students from the first to the sixth year of the Faculty of Medicine of Ribeirão Preto/USP, from 2013 to 2018. The seven indicators of good practices were: 1. Addresses a relevant topic in medical training; 2. Statement longer than key answer and distractors; 3. Application of knowledge evaluated; 4.Clear lead-in defined for the item in the statement; 5. Only one domain of knowledge assessed in each item; 6.Plausible and homogeneous key answer and distractors; 7.Absence of flaws that add unnecessary difficulty or give clues to the correct answer. Two independent evaluators analyzed the items and, if necessary, they jointly reviewed any disagreement. Result: The analysis showed a good technical quality of most items in the six PT exams. In addition, they indicated that non-adherence was a bit more frequent for indicators 4 and 5, which can compromise both the validity and the interpretation of the test results in terms of knowledge gaps on the part of students. Conclusion: In general, the quality of the items was very good but there are some opportunities for improvement in the process of item writing based on faculty development within the institution.

11.
Movimento (Porto Alegre) ; 27: e27076, 2021. graf
Artigo em Português | LILACS | ID: biblio-1365171

RESUMO

Resumo O artigo analisa as questões do novo Enem que apresentam objetos de conhecimento da Educação Física, compreendendo as relações que se estabelecem por e entre as áreas do conhecimento mediadas pelas competências e habilidades. De natureza qualitativa, utiliza a análise crítico-documental como metodologia. As fontes são compostas por: 49 questões relacionadas com a Educação Física no exame (2009 a 2017), as Diretrizes Nacionais para o Ensino Médio, o Documento Básico do Enem e a Matriz de Referência do Enem. Os usos e apropriações dos objetos de conhecimento da Educação Física, devido à sua natureza multidisciplinar, favorecem a exigência da elaboração das questões do exame visando à interdisciplinaridade por e entre as áreas do conhecimento, explorando as relações cotidianas dos sujeitos. A interação entre os objetivos da área de Linguagens e suas Tecnologias, promovida pelo exame, permite-nos compreender a Educação Física dentro de um contexto amplo de formação, para além do saberes de domínio.


Resumen El estudio analiza las preguntas del Nuevo Enem (Examen Nacional de la Enseñanza Media) que presentan objetos de conocimiento de la Educación Física, abarcando las relaciones que se establecen por y entre las áreas de conocimiento mediadas por competencias y habilidades. De naturaleza cualitativa, utiliza el análisis crítico-documental como metodología. Las fuentes están compuestas por: 49 preguntas relacionadas con la Educación Física en el examen (2009 a 2017), las Directrices Nacionales para la Enseñanza Media, el Documento Básico del Enem y la Matriz de Referencia del Enem. Los usos y apropiaciones de los objetos de conocimiento de la Educación Física, por su carácter multidisciplinario, favorecen la exigencia de elaborar las preguntas del examen buscando la interdisciplinariedad por y entre las áreas de conocimiento, explorando las relaciones cotidianas de los sujetos. La interacción entre los objetivos del área de Lenguas y sus Tecnologías, impulsada por el examen, nos permite comprender la Educación Física dentro de un amplio contexto de formación, más allá de los saberes del campo.


Abstract The article analyzes the issues that contain the Physical Education present in Novo Enem, understanding the relationships established by and between the areas of knowledge mediated by competences and abilities. Of the qualitative type, it uses documentary critical analysis as a methodology. The sources are composed of: 49 questions related to Physical Education in the exam (2009 to 2017), the National Guidelines for Secondary Education, the Basic Document of Enem and the Reference Matrix of Enem. The uses and appropriations of the contents of Physical Education, due to its multidisciplinary nature, favors the requirement to elaborate the exam questions aiming at interdisciplinarity by and between the areas of knowledge, exploring the subjects' daily relationships. The interaction between the objectives of the Languages, Codes and their Technologies area, promoted by the exam, allows us to understand Physical Education within a broad context of training, in addition to domain knowledge.


Assuntos
Humanos , Masculino , Feminino , Adolescente , Educação Física e Treinamento , Conhecimento , Questões de Prova , Ensino Fundamental e Médio , Pesquisa Qualitativa
12.
Salud Publica Mex ; 61(5): 637-647, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31661741

RESUMO

OBJECTIVE: This study aimed to compare the performance in the National Assessment for Applicants for Medical Resi- dency (ENARM in spanish) of private versus public medical schools, geographic regions and socioeconomic levels by using three different statistical methods (summary measurements, the rate of change and the area under the receiver operator characteristics [AUROC]). These methods have not been previously used for the ENARM; however, some variations of the summary measurements have been reported in some USA assessments of medical school graduates. MATERIALS AND METHODS: Cross-sectional study based on historical data (2001-2017). We use summary measures and colourfilled map. The statistical analysis included Mann-Whitney U, Kruskal-Wallis, Spearman correlation coefficient (Rs), and linear regression. RESULTS: A total of 113 medical schools were included in our analysis; 60 were public and 53 private. We found difference in the median of total scores for type of schools, MD= 54.07 vs. MD= 57.36, p= 0.011. There were also significant differences among geographic and socioeconomic regions (p<0.05). CONCLUSIONS: Differences exist in the total scores and percentage of selected test-takers between type of schools, geographic and socioeconomic regions. Higher scores are prevalent in the Northeast and Norwest regions. Additional research is required to identify factors that contribute to these differences. Unsuspected differences in examination scores can be unveiled using summary measures.


OBJETIVO: Comparar el desempeño en el Examen Nacional de Aspirantes a Residencias Médicas (ENARM) de escuelas de medicina privadas y públicas, regiones geográficas y niveles socioeconómicos mediante el uso de tres métodos estadísti- cos diferentes (medidas de resumen, tasa de cambio y el área bajo las características del operador receptor [AUROC en inglés]). Estos métodos no han sido utilizados previamente para el ENARM; sin embargo, se han informado algunas variaciones de las mediciones de resumen en algunas evaluaciones de graduados de medicina de Estados Unidos. MATERIAL Y MÉTODOS: Estudio transversal basado en datos históricos (2001-2017). Se usaron medidas de resumen y un mapa lleno de color. El análisis estadístico incluyó Mann Whitney U, Kruskal-Wallis y coeficiente de correlación de Spearman (Rs). RESULTADOS: Se incluyeron 113 escuelas de medicina en el análisis; 60 eran públicas y 53 privadas. Se encontraron diferencias en la mediana de las puntuaciones totales para el tipo de escuelas, MD= 54.07 vs. MD= 57.36, p= 0.011. También hubo diferencias significativas entre las regiones geográficas y socioeconómicas (p<0.05). CONCLUSIONES: Existen diferencias en los puntajes totales y el porcentaje de examinados seleccionados entre el tipo de escuelas, regiones geográficas y socioeconómicas. Las puntuaciones más altas prevalecen en las regiones noreste y noroeste. Se requieren investigaciones adicionales para identificar los factores que contribuyen a estas diferencias. Las diferencias insospechadas en los puntajes de los exámenes se pueden revelar usando medidas de resumen.


Assuntos
Avaliação Educacional/estatística & dados numéricos , Internato e Residência/estatística & dados numéricos , Setor Privado/estatística & dados numéricos , Setor Público/estatística & dados numéricos , Faculdades de Medicina/estatística & dados numéricos , Área Sob a Curva , Humanos , México , Curva ROC , Faculdades de Medicina/provisão & distribuição , Fatores Socioeconômicos , Estatísticas não Paramétricas
13.
Salud pública Méx ; 61(5): 637-647, sep.-oct. 2019. tab, graf
Artigo em Inglês | LILACS | ID: biblio-1127327

RESUMO

Abstract: Objectives: This study aimed to compare the performance in the National Assessment for Applicants for Medical Residency (ENARM in spanish) of private versus public medical schools, geographic regions and socioeconomic levels by using three different statistical methods (summary measurements, the rate of change and the area under the receiver operator characteristics [AUROC]). These methods have not been previously used for the ENARM; however, some variations of the summary measurements have been reported in some USA assessments of medical school graduates. Materials and methods: Cross-sectional study based on historical data (2001-2017). We use summary measures and colour-filled map. The statistical analysis included Mann-Whitney U, Kruskal-Wallis, Spearman correlation coefficient (Rs), and linear regression. Results: A total of 113 medical schools were included in our analysis; 60 were public and 53 private. We found difference in the median of total scores for type of schools, MD= 54.07 vs. MD= 57.36,p= 0.011. There were also significant differences among geographic and socioeconomic regions (p<0.05). Conclusions: Differences exist in the total scores and percentage of selected test-takers between type of schools, geographic and socioeconomic regions. Higher scores are prevalent in the Northeast and Norwest regions. Additional research is required to identify factors that contribute to these differences. Unsuspected differences in examination scores can be unveiled using summary measures.


Resumen: Objetivo: Comparar el desempeño en el Examen Nacional de Aspirantes a Residencias Médicas (ENARM) de escuelas de medicina privadas y públicas, regiones geográficas y niveles socioeconómicos mediante el uso de tres métodos estadísticos diferentes (medidas de resumen, tasa de cambio y el área bajo las características del operador receptor [AUROC en inglés]). Estos métodos no han sido utilizados previamente para el ENARM; sin embargo, se han informado algunas variaciones de las mediciones de resumen en algunas evaluaciones de graduados de medicina de Estados Unidos. Material y métodos: Estudio transversal basado en datos históricos (2001-2017). Se usaron medidas de resumen y un mapa lleno de color. El análisis estadístico incluyó Mann Whitney U, Kruskal-Wallis y coeficiente de correlación de Spearman (Rs). Resultados: Se incluyeron 113 escuelas de medicina en el análisis; 60 eran públicas y 53 privadas. Se encontraron diferencias en la mediana de las puntuaciones totales para el tipo de escuelas, MD= 54.07 vs. MD= 57.36,p= 0.011. También hubo diferencias significativas entre las regiones geográficas y socioeconómicas (p<0.05). Conclusiones: Existen diferencias en los puntajes totales y el porcentaje de examinados seleccionados entre el tipo de escuelas, regiones geográficas y socioeconómicas. Las puntuaciones más altas prevalecen en las regiones noreste y noroeste. Se requieren investigaciones adicionales para identificar los factores que contribuyen a estas diferencias. Las diferencias insospechadas en los puntajes de los exámenes se pueden revelar usando medidas de resumen.


Assuntos
Humanos , Faculdades de Medicina/estatística & dados numéricos , Setor Público/estatística & dados numéricos , Setor Privado/estatística & dados numéricos , Avaliação Educacional/estatística & dados numéricos , Internato e Residência/estatística & dados numéricos , Faculdades de Medicina/provisão & distribuição , Fatores Socioeconômicos , Curva ROC , Estatísticas não Paramétricas , Área Sob a Curva , México
14.
Rev. bras. educ. méd ; 42(4): 74-85, out.-dez. 2018. tab, graf
Artigo em Português | LILACS | ID: biblio-977547

RESUMO

RESUMO As questões objetivas ou testes de múltipla escolha com somente uma alternativa correta constituem um dos métodos mais utilizados em todo o mundo em exames destinados a avaliar habilidades cognitivas, especialmente nas avaliações somativas. Provas que contêm predominantemente questões objetivas de múltipla escolha são utilizadas sobretudo nos exames em que muita coisa está em jogo, como concursos vestibulares, provas finais de cursos de graduação e exames próprios dos concursos de ingresso à residência médica ou de obtenção de título de especialista. Esta ampla difusão justifica-se pelo fato de os exames compostos com este tipo de questão preencherem mais completamente os requisitos de validade e de fidedignidade, além de terem vantagens quanto à viabilidade ou factibilidade, particularmente em provas com grande número de candidatos. No entanto, os requisitos de validade e fidedignidade, em especial, somente são preenchidos adequadamente quando se seguem normas próprias das boas práticas de construção de exames e de elaboração dos testes propriamente ditos. Neste artigo se descrevem algumas das boas práticas na elaboração de testes de múltipla escolha, baseadas em fontes da literatura nacional e internacional, bem como na experiência dos autores. Apresenta-se e se discute um conjunto de regras práticas para construir questões de múltipla escolha de boa qualidade no que se refere à forma e ao conteúdo e se comenta como compor tabelas de especificação. Este tipo particular de matriz da avaliação permite verificar o alinhamento entre os temas abordados na prova e os objetivos curriculares ou o que se espera que os estudantes/candidatos dominem, o que configura um importante indicador de validade. Apresenta-se também uma experiência bem-sucedida de trabalho em grupo na organização de exames que utilizam este tipo de questão, como exemplo de desenvolvimento de processo organizado para obtenção de questões de melhor qualidade, que também contribuiu para o desenvolvimento docente no campo de avaliação da aprendizagem.


ABSTRACT Objective items or multiple choice questions with just one correct answer are among the most widely used methods for cognitive skills assessment, especially in exams designed for summative purposes. Assessments related to the cognitive domain using multiple choice questions are mostly used in high-stake exams, i.e. where the risks of failing are associated with serious consequences for the candidates. The widespread use of objective items for assessing learning in the cognitive domain may be explained by the fact that this exam modality fulfills both validity and reliability requirements, with the additional advantage that they are practical for use in exams with large numbers of candidates. Nevertheless, the validity and reliability requirements, in particular, will only be properly fulfilled when the process of writing multiple choice questions follows the rules of good practices for constructing exams and writing tests. This manuscript describes some of the rules for developing high quality multiple choice tests, based on both national and international published sources, as well as on the author's experience. These rules relate to the content, language and presentation of the questions. This paper also addresses the importance of following appropriate rules for blueprinting construction, in order to show the alignment between assessment and curriculum and thereby contribute to meeting the validity requirements. It also briefly describes and discusses a successful experience of team work for constructing items and organizing exams. This experience exemplifies the combination of an organized process for constructing high quality questions for a well-balanced examination with an institutional strategy for faculty development in the field of learning assessment.

15.
GMS J Med Educ ; 35(2): Doc25, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29963615

RESUMO

Illustrated Multiple-choice questions (iMCQs) form an integral part of written tests in anatomy. In iMCQs, the written question refers to various types of figures, e. g. X-ray images, micrographs of histological sections, or drawings of anatomical structures. Since the inclusion of images in MCQs might affect item performance we compared characteristics of anatomical items tested with iMCQs and non-iMCQs in seven tests of anatomy courses and in two written parts of the first section of the German Medical Licensing Examination (M1). In summary, we compared 25 iMCQs and 163 non-iMCQs from anatomy courses, and 27 iMCQs and 130 non-iMCQs from the written part of the M1 using a nonparametric test for unpaired samples. As a result, there were no significant differences in difficulty and discrimination levels between iMCQs and non-iMCQs, the same applied to an analysis stratified for MCQ formats. We conclude that the illustrated item format by itself does not seem to affect item difficulty. The present results are consistent with previous retrospective studies which showed no significant differences of test or item characteristics between iMCQs and non-iMCQs.


Assuntos
Avaliação Educacional , Licenciamento , Comportamento de Escolha , Estudos Retrospectivos , Redação
16.
Rev. bras. educ. méd ; 42(2): 26-33, Apr.-June 2018. tab, graf
Artigo em Inglês | LILACS | ID: biblio-958595

RESUMO

ABSTRACT Introduction: Residency admission exams, although not intended to evaluate medical training, do so in an indirect way. The evaluation of the quality of the medical residency tests allows, among other things, to re-evaluate the training process itself and the skills expected of the candidates. Objective: To evaluate first phase exam tests of different medical residency programs in the largest Brazilian urban centers. Method: We evaluated 500 questions of residency admission exams in the states of São Paulo, Rio de Janeiro and Minas Gerais. The items were evaluated in terms of their origin, geographical location, area of knowledge, contextualization, context scenarios and complexity by Bloom's taxonomy. Results: Most of the questions presented contextualization (64.4%, n = 322), with predominant scenarios of high complexity and in hospital environment. The predominant taxonomic category was identified as recognition (41.60%, n = 208), the second most frequent was judgment, in 26% of the questions (n = 130), followed by synthesis (15%, n = 75), analysis (7.60%, n = 38), comprehension (6%, n = 30) and application (3.8%, n = 19). Considering the dichotomization between questions of theoretical and clinical reasoning, we found a balance between both (clinical reasoning: 48.9%, n = 243; theoretical reasoning: 51.4%, n = 257). The association of contextualization with clinical reasoning was high, with the relative risk of an item requiring clinical reasoning in the presence of contextualization of 26.31 (CI 11.06 - 62.59). Final considerations: The scenario outlined by the present research demonstrates that the different selective processes for medical residency in Brazil differ greatly in relation to the selection profile, with hospital-centered focus, favoring scenarios of high complexity in a hospital environment. Although much has been done and discussed in order to promote changes in medical education in Brazil, the selection process for Medical Residency still fails to reflect the changes advocated since the end of the last century and consolidated in the public policies of the beginning of this century. If we consider that the selected professionals are likely to remain at that institution after the end of their undergraduate studies, then we can have some understanding of the feedback cycle that is created in the programs.


RESUMO Introdução: A prova de residência, apesar de não ter o objetivo de avaliar a formação médica, o faz indiretamente. A avaliação da qualidade das provas de residência médica permite, entre outras coisas, reavaliar o próprio processo de formação e as competências esperadas para os profissionais. Objetivo: Avaliar provas de primeira fase de diferentes programas de residência médica dos maiores centros urbanos brasileiros. Método: Foram avaliadas 500 questões de provas de residência dos estados de São Paulo, Rio de Janeiro e Minas Gerais. As questões foram avaliadas considerando sua origem, localização geográfica, área de conhecimento, contextualização, cenários do contexto e complexidade pela taxonomía de Bloom. Resultados: A maioria das questões apresentava contextualização (64,4%, n = 322), sendo os cenários predominantes de alta complexidade e em ambiente hospitalar. Identificou-se que a categoria taxonômica predominante foi o reconhecimento (41,60%, n = 208), sendo a segunda categoria mais frequente o julgamento em 26% das questões (n = 130), seguidas de síntese (15%, n = 75), análise (7,60%, n = 38), compreensão (6%, n = 30) e aplicação (3,8%, n = 19). Considerando a dicotomização entre questões de raciocínio teórico e clínico, encontramos uma situação de equilíbrio entre ambas (raciocínio clínico: 48,9%, n = 243; raciocínio teórico: 51,4%, n = 257). A associação de contextualização com raciocínio clínico foi alta, com risco relativo de uma questão solicitar raciocínio clínico na presença de contextualização de 26,31 (IC 11,06 - 62,59). Considerações finais: O quadro delineado pela presente pesquisa demonstra que os diferentes processos seletivos para residência médica no Brasil diferem muito entre si quanto ao perfil de seleção, com provas de caráter hospitalocêntrico, privilegiando cenários de alta complexidade em ambiente hospitalar. Embora muito se tenha feito e falado no sentido de promover mudanças na educação médica do Brasil, o processo seletivo para residência médica ainda não reflete as mudanças preconizadas desde o fim do século passado e consolidadas nas políticas públicas do início deste século. Se pensarmos que estamos selecionando profissionais que muito provavelmente se fixarão naquela instituição após o fim de sua pós-graduação, podemos ter uma ideia do círculo de retroalimentação que se cria nos programas.

17.
Sultan Qaboos Univ Med J ; 18(1): e68-e74, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29666684

RESUMO

OBJECTIVES: The current study aimed to carry out a post-validation item analysis of multiple choice questions (MCQs) in medical examinations in order to evaluate correlations between item difficulty, item discrimination and distraction effectiveness so as to determine whether questions should be included, modified or discarded. In addition, the optimal number of options per MCQ was analysed. METHODS: This cross-sectional study was performed in the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain. A total of 800 MCQs and 4,000 distractors were analysed between November 2013 and June 2016. RESULTS: The mean difficulty index ranged from 36.70-73.14%. The mean discrimination index ranged from 0.20-0.34. The mean distractor efficiency ranged from 66.50-90.00%. Of the items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% had zero, one, two, three and four nonfunctional distractors (NFDs), respectively. Using three or four rather than five options in each MCQ resulted in 95% or 83.6% of items having zero NFDs, respectively. The distractor efficiency was 91.87%, 85.83% and 64.13% for difficult, acceptable and easy items, respectively (P <0.005). Distractor efficiency was 83.33%, 83.24% and 77.56% for items with excellent, acceptable and poor discrimination, respectively (P <0.005). The average Kuder-Richardson formula 20 reliability coefficient was 0.76. CONCLUSION: A considerable number of the MCQ items were within acceptable ranges. However, some items needed to be discarded or revised. Using three or four rather than five options in MCQs is recommended to reduce the number of NFDs and improve the overall quality of the examination.


Assuntos
Avaliação Educacional/normas , Pediatria/métodos , Psicometria/normas , Habilidades para Realização de Testes/normas , Barein , Comportamento de Escolha , Estudos Transversais , Educação de Graduação em Medicina/métodos , Educação de Graduação em Medicina/normas , Avaliação Educacional/métodos , Humanos , Pediatria/organização & administração , Psicometria/instrumentação , Psicometria/métodos , Reprodutibilidade dos Testes , Habilidades para Realização de Testes/métodos , Universidades/organização & administração
18.
Rev. méd. Chile ; 146(1): 46-52, ene. 2018. tab
Artigo em Espanhol | LILACS | ID: biblio-902621

RESUMO

Background: Learning assessment has great impact in students' achievement. However, it is one of the least intervened and researched areas in higher education institutions, all over the world. Aim: To compare the written tests applied to students of three health science undergraduate programs (Speech Therapy, Medical Technology and Nursing), with the written tests of three programs of other areas (Business and Administration, Psychology and Bioengineering). Material and Methods: Comparisons were done using the Authentic Assessment Model's indicators. Also, the magnitude of the change in these variables was evaluated in these two groups of undergraduate programs, after the participation of the teachers in a training program based on this model. A quantitative and repeated measurements design was used. Nineteen teachers participated (nine from medical sciences and 10 from other areas), who drafted 88 written tests before the intervention (which involved 1,318 items) and 93 written tests (that grouped 1,051 items) after it. Items were analyzed using a Hierarchical Lineal Model (HLM), controlling the tests' and the teachers' effects. Results: Both groups of undergraduate programs use multiple choice items with a higher frequency, although there were differences in the rest of the items. Also, HLM analysis showed that these programs differed in their changes after the intervention. Health science programs had less improvement in changing the kind of items used, but improved more in Authentic Assessment indicators. Conclusions: Written tests improved after an intervention aiming to improve the teachers' skills to prepare such tests.


Assuntos
Humanos , Pessoal de Saúde/educação , Avaliação Educacional/métodos , Estudantes , Estudantes de Ciências da Saúde , Universidades , Redação , Chile , Estudos Longitudinais , Estudos de Avaliação como Assunto
19.
Artigo em Coreano | WPRIM (Pacífico Ocidental) | ID: wpr-740777

RESUMO

PURPOSE: This is a descriptive study to investigate the relevance between biological nursing science subjects (structure and function of the human body (SFHB), mechanism and effects of drugs (MED), clinical microbiology) and examination workbook items for Registered Nurse Licensure Examination (RNLE) in Republic of Korea (ROK) and the United States of America (USA). METHODS: RNLE 8 workbooks which were published by the Korean Nurses Association were utilized for analysis of Korean RNLE. Saunders comprehensive review for the NCLEX-RN® examination was used for analysis of US RNLE. The relevance between items in the standard syllabuses of biological nursing science subjects (SFHB, MED, clinical microbiology) and the RNLE items of these workbooks in ROK and the USA was analyzed. RESULTS: The relevance rates of ROK and the USA were 3.6% vs 0.4% in SFHB, 8.9% vs 23.0% in MED, and 4.5% vs 5.8% in clinical microbiology. CONCLUSION: In SFHB, the relevance of the RNLE in ROK was higher than that of the USA. However in MED the relevance of the RNLE in USA was higher than that of the ROK. Since medications are one of major tasks of nurses, it is necessary to increase the number of related items in the RNLE in ROK.


Assuntos
Humanos , América , Corpo Humano , Licenciamento , Enfermeiras e Enfermeiros , Enfermagem , República da Coreia , Estados Unidos
20.
Asian Pac J Cancer Prev ; 18(6): 1663-1670, 2017 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-28670886

RESUMO

Background: In Korea, the national cancer database was constructed after the initiation of the national cancer registration project in 1980, and the annual national cancer registration report has been published every year since 2005. Consequently, data management must begin even at the stage of data collection in order to ensure quality. Objectives: To determine the suitability of cancer registries' inquiry tools through the inquiry analysis of the Korea Central Cancer Registry (KCCR), and identify the needs to improve the quality of cancer registration. Methods: Results of 721 inquiries to the KCCR from 2000 to 2014 were analyzed by inquiry year, question type, and medical institution characteristics. Using Stata version 14.1, descriptive analysis was performed to identify general participant characteristics, and chi-square analysis was applied to investigate significant differences in distribution characteristics by factors affecting the quality of cancer registration data. Results: The number of inquiries increased in 2005­2009. During this period, there were various changes, including the addition of cancer registration items such as brain tumors and guideline updates. Of the inquirers, 65.3% worked at hospitals in metropolitan cities and 60.89% of hospitals had 601­1000 beds. Tertiary hospitals had the highest number of inquiries (64.91%), and the highest number of questions by type were 353 (48.96%) for histological codes, 92 (12.76%) for primary sites, and 76 (10.54%) for reportable. Conclusions: A cancer registration inquiry system is an effective method when not confident about codes during cancer registration, or when confronting cancer cases in which previous clinical knowledge or information on the cancer registration guidelines are insufficient.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...