Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 4.056
Filter
1.
South Med J ; 117(6): 342-344, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38830589

ABSTRACT

OBJECTIVES: This study assessed the content of US Medical Licensing Examination question banks with regard to out-of-hospital births and whether the questions aligned with current evidence. METHODS: Three question banks were searched for key words regarding out-of-hospital births. A thematic analysis was then utilized to analyze the results. RESULTS: Forty-seven questions were identified, and of these, 55% indicated a lack of inadequate, limited, or irregular prenatal care in the question stem. CONCLUSIONS: Systematic studies comparing prenatal care in out-of-hospital births versus hospital births are nonexistent, leading to the potential for bias and adverse outcomes. Adjustments to question stems that accurately portray current evidence are recommended.


Subject(s)
Licensure, Medical , Humans , United States , Licensure, Medical/standards , Female , Pregnancy , Prenatal Care/standards , Educational Measurement/methods , Education, Medical/methods , Education, Medical/standards
2.
BMC Med Educ ; 24(1): 504, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714975

ABSTRACT

BACKGROUND: Evaluation of students' learning strategies can enhance academic support. Few studies have investigated differences in learning strategies between male and female students as well as their impact on United States Medical Licensing Examination® (USMLE) Step 1 and preclinical performance. METHODS: The Learning and Study Strategies Inventory (LASSI) was administered to the classes of 2019-2024 (female (n = 350) and male (n = 262)). Students' performance on preclinical first-year (M1) courses, preclinical second-year (M2) courses, and USMLE Step 1 was recorded. An independent t-test evaluated differences between females and males on each LASSI scale. A Pearson product moment correlation determined which LASSI scales correlated with preclinical performance and USMLE Step 1 examinations. RESULTS: Of the 10 LASSI scales, Anxiety, Attention, Information Processing, Selecting Main Idea, Test Strategies and Using Academic Resources showed significant differences between genders. Females reported higher levels of Anxiety (p < 0.001), which significantly influenced their performance. While males and females scored similarly in Concentration, Motivation, and Time Management, these scales were significant predictors of performance variation in females. Test Strategies was the largest contributor to performance variation for all students, regardless of gender. CONCLUSION: Gender differences in learning influence performance on STEP1. Consideration of this study's results will allow for targeted interventions for academic success.


Subject(s)
Education, Medical, Undergraduate , Educational Measurement , Licensure, Medical , Students, Medical , Humans , Female , Male , Educational Measurement/methods , Education, Medical, Undergraduate/standards , Sex Factors , Licensure, Medical/standards , Learning , United States , Academic Performance , Young Adult
4.
Harefuah ; 163(5): 323-326, 2024 May.
Article in Hebrew | MEDLINE | ID: mdl-38734948

ABSTRACT

INTRODUCTION: Two Jewish medical students who were forced to discontinue their study upon the raise of the Nazi regime, returned/ immigrated to Palestine and did their internship in Palestine. A third student, although faced with many procedural limitations, was able to continue most of his studies in Berlin including passing the MD examination. The first two students returned, after some years, to Berlin to sit for the Doctor examination which enabled them to gain a permanent medical license in Palestine. We describe the different backgrounds of the 3 students which enabled them to do the examination at Berlin's medical faculty during the Nazi regime. The follow up of the three, revealed glorious medical career during the British mandate and during the first years of the new state of Israel. The Dissertations were signed and supported by three leading Professors of the Berlin's Faculty. Two of them were found to have a National-Socialistic background.


Subject(s)
Jews , National Socialism , Students, Medical , Humans , Arabs , Berlin , Education, Medical/history , Education, Medical/organization & administration , Internship and Residency , Israel , Licensure, Medical/history , National Socialism/history , History, 20th Century
6.
West J Emerg Med ; 25(2): 209-212, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38596920

ABSTRACT

Introduction: Learners frequently benefit from modalities such as small-group, case-based teaching and interactive didactic experiences rather than passive learning methods. These contemporary techniques are features of Foundations of Emergency Medicine (FoEM) curricula, and particularly the Foundations I (F1) course, which targets first-year resident (PGY-1) learners. The American Board of Emergency Medicine administers the in-training exam (ITE) that provides an annual assessment of EM-specific medical knowledge. We sought to assess the effect of F1 implementation on ITE scores. Methods: We retrospectively analyzed data from interns at four EM residency programs accredited by the Accreditation Council for Graduate Medical Education. We collected data in 2021. Participating sites were geographically diverse and included three- and four-year training formats. We collected data from interns two years before (control group) and two years after (intervention group) implementation of F1 at each site. Year of F1 implementation ranged from 2015-2018 at participating sites. We abstracted data using a standard form including program, ITE raw score, year of ITE administration, US Medical Licensing Exam Step 1 score, Step 2 Clinical Knowledge (CK) score, and gender. We performed univariable and multivariable linear regression to explore differences between intervention and control groups. Results: We collected data for 180 PGY-1s. Step 1 and Step 2 CK scores were significant predictors of ITE in univariable analyses (both with P < 0.001). After accounting for Step 1 and Step 2 CK scores, we did not find F1 implementation to be a significant predictor of ITE score, P = 0.83. Conclusion: Implementation of F1 curricula did not show significant changes in performance on the ITE after controlling for important variables.


Subject(s)
Emergency Medicine , Internship and Residency , Humans , United States , Educational Measurement/methods , Retrospective Studies , Clinical Competence , Curriculum , Emergency Medicine/education , Licensure, Medical
7.
Sci Rep ; 14(1): 9330, 2024 04 23.
Article in English | MEDLINE | ID: mdl-38654011

ABSTRACT

While there is data assessing the test performance of artificial intelligence (AI) chatbots, including the Generative Pre-trained Transformer 4.0 (GPT 4) chatbot (ChatGPT 4.0), there is scarce data on its diagnostic accuracy of clinical cases. We assessed the large language model (LLM), ChatGPT 4.0, on its ability to answer questions from the United States Medical Licensing Exam (USMLE) Step 2, as well as its ability to generate a differential diagnosis based on corresponding clinical vignettes from published case reports. A total of 109 Step 2 Clinical Knowledge (CK) practice questions were inputted into both ChatGPT 3.5 and ChatGPT 4.0, asking ChatGPT to pick the correct answer. Compared to its previous version, ChatGPT 3.5, we found improved accuracy of ChatGPT 4.0 when answering these questions, from 47.7 to 87.2% (p = 0.035) respectively. Utilizing the topics tested on Step 2 CK questions, we additionally found 63 corresponding published case report vignettes and asked ChatGPT 4.0 to come up with its top three differential diagnosis. ChatGPT 4.0 accurately created a shortlist of differential diagnoses in 74.6% of the 63 case reports (74.6%). We analyzed ChatGPT 4.0's confidence in its diagnosis by asking it to rank its top three differentials from most to least likely. Out of the 47 correct diagnoses, 33 were the first (70.2%) on the differential diagnosis list, 11 were second (23.4%), and three were third (6.4%). Our study shows the continued iterative improvement in ChatGPT's ability to answer standardized USMLE questions accurately and provides insights into ChatGPT's clinical diagnostic accuracy.


Subject(s)
Artificial Intelligence , Humans , United States , Diagnosis, Differential , Licensure, Medical , Clinical Competence , Educational Measurement/methods
8.
PLoS One ; 19(4): e0302217, 2024.
Article in English | MEDLINE | ID: mdl-38687696

ABSTRACT

Efforts are being made to improve the time effectiveness of healthcare providers. Artificial intelligence tools can help transcript and summarize physician-patient encounters and produce medical notes and medical recommendations. However, in addition to medical information, discussion between healthcare and patients includes small talk and other information irrelevant to medical concerns. As Large Language Models (LLMs) are predictive models building their response based on the words in the prompts, there is a risk that small talk and irrelevant information may alter the response and the suggestion given. Therefore, this study aims to investigate the impact of medical data mixed with small talk on the accuracy of medical advice provided by ChatGPT. USMLE step 3 questions were used as a model for relevant medical data. We use both multiple-choice and open-ended questions. First, we gathered small talk sentences from human participants using the Mechanical Turk platform. Second, both sets of USLME questions were arranged in a pattern where each sentence from the original questions was followed by a small talk sentence. ChatGPT 3.5 and 4 were asked to answer both sets of questions with and without the small talk sentences. Finally, a board-certified physician analyzed the answers by ChatGPT and compared them to the formal correct answer. The analysis results demonstrate that the ability of ChatGPT-3.5 to answer correctly was impaired when small talk was added to medical data (66.8% vs. 56.6%; p = 0.025). Specifically, for multiple-choice questions (72.1% vs. 68.9%; p = 0.67) and for the open questions (61.5% vs. 44.3%; p = 0.01), respectively. In contrast, small talk phrases did not impair ChatGPT-4 ability in both types of questions (83.6% and 66.2%, respectively). According to these results, ChatGPT-4 seems more accurate than the earlier 3.5 version, and it appears that small talk does not impair its capability to provide medical recommendations. Our results are an important first step in understanding the potential and limitations of utilizing ChatGPT and other LLMs for physician-patient interactions, which include casual conversations.


Subject(s)
Physician-Patient Relations , Humans , Female , Male , Adult , Communication , Health Personnel , Licensure, Medical/standards , Artificial Intelligence , Counseling , Middle Aged
9.
JMIR Med Educ ; 10: e55048, 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38686550

ABSTRACT

Background: The deployment of OpenAI's ChatGPT-3.5 and its subsequent versions, ChatGPT-4 and ChatGPT-4 With Vision (4V; also known as "GPT-4 Turbo With Vision"), has notably influenced the medical field. Having demonstrated remarkable performance in medical examinations globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's medical licensing examinations-a critical step for medical practitioners in Chile-is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural contexts. Objective: This study aims to evaluate the performance of ChatGPT versions 3.5, 4, and 4V in the EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile. Methods: Three official practice drills (540 questions) from the University of Chile, mirroring the EUNACOM's structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The 3 ChatGPT versions were provided 3 attempts for each drill. Responses to questions during each attempt were systematically categorized and analyzed to assess their accuracy rate. Results: All versions of ChatGPT passed the EUNACOM drills. Specifically, versions 4 and 4V outperformed version 3.5, achieving average accuracy rates of 79.32% and 78.83%, respectively, compared to 57.53% for version 3.5 (P<.001). Version 4V, however, did not outperform version 4 (P=.73), despite the additional visual capabilities. We also evaluated ChatGPT's performance in different medical areas of the EUNACOM and found that versions 4 and 4V consistently outperformed version 3.5. Across the different medical areas, version 3.5 displayed the highest accuracy in psychiatry (69.84%), while versions 4 and 4V achieved the highest accuracy in surgery (90.00% and 86.11%, respectively). Versions 3.5 and 4 had the lowest performance in internal medicine (52.74% and 75.62%, respectively), while version 4V had the lowest performance in public health (74.07%). Conclusions: This study reveals ChatGPT's ability to pass the EUNACOM, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) have not significantly led to enhancements in performance on image-based questions. The variations in proficiency across medical fields suggest the need for more nuanced AI training. Additionally, the study underscores the importance of exploring innovative approaches to using AI to augment human cognition and enhance the learning process. Such advancements have the potential to significantly influence medical education, fostering not only knowledge acquisition but also the development of critical thinking and problem-solving skills among health care professionals.


Subject(s)
Educational Measurement , Licensure, Medical , Female , Humans , Male , Chile , Clinical Competence/standards , Educational Measurement/methods , Educational Measurement/standards
10.
J Osteopath Med ; 124(6): 257-265, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38498662

ABSTRACT

CONTEXT: The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores. OBJECTIVES: The purpose of this study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane's validity framework. METHODS: Kane's validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this study is COMLEX-USA Level 2 Performance Evaluation (L2-PE). RESULTS: All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure. CONCLUSIONS: This study provides some additional validity evidence for the L3 examination based on Kane's validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.


Subject(s)
Educational Measurement , Licensure, Medical , Osteopathic Medicine , United States , Humans , Educational Measurement/methods , Educational Measurement/standards , Licensure, Medical/standards , Osteopathic Medicine/education , Osteopathic Medicine/standards , Reproducibility of Results , Clinical Competence/standards
11.
Orthopadie (Heidelb) ; 53(5): 311-316, 2024 May.
Article in German | MEDLINE | ID: mdl-38546842

ABSTRACT

BackgroundThe amendment to the medical licensing regulations (ÄApprO) was decided at the federal level in the version of the "Master Plan for Medical Studies 2020" passed in 2017. In addition to the organizational effort involved in redesigning the curricular teaching, the expected costs associated with the implementation of the new licensing regulations due to the necessary additional time and, therefore, personnel expenditure are of particular importance. Taking into account the different forms of study and the 20% scope for study-design provided to the individual faculties, the process of transferring the teaching content to the new modules confronts us with an enormous organizational challenge.Significance of O&UDiseases of the musculoskeletal system are of particular medical, social and economic importance. Therefore, the training of future physicians in the field of orthopedics and traumatology must be taken into account. The visibility of the field of orthopedics and traumatology must not be lost with the introduction of the new medical licensing regulations (ÄApprO).ImplementationThe implementation of the new medical licensing regulations at German universities will be costly and necessitates an increased number of staff. However, there is a great opportunity to position orthopedics and traumatology as a "central player" in the modular, interdisciplinary and interprofessional course landscape. It is, therefore, important to take on concrete responsibility for the design of the new teaching programs and to bring in our specialist and interdisciplinary skills wherever sensible and possible.


Subject(s)
Licensure, Medical , Orthopedics , Humans , Curriculum/trends , Forecasting , Germany , Government Regulation , Licensure, Medical/legislation & jurisprudence , Orthopedics/education , Orthopedics/legislation & jurisprudence
13.
Postgrad Med J ; 100(1184): 382-390, 2024 May 18.
Article in English | MEDLINE | ID: mdl-38298001

ABSTRACT

PURPOSE: 'Low-value' clinical care and medical services are 'questionable' activities, being more likely to cause harm than good or with disproportionately low benefit relative to cost. This study examined the predictive ability of the QUestionable In Training Clinical Activities Index (QUIT-CAI) for general practice (GP) registrars' (trainees') performance in Australian GP Fellowship examinations (licensure/certification examinations for independent GP). METHODS: The study was nested in ReCEnT, an ongoing cohort study in which Australian GP registrars document their in-consultation clinical practice. Outcome factors in analyses were individual registrars' scores on the three Fellowship examinations ('AKT', 'KFP', and 'OSCE' examinations) and pass/fail rates during 2012-21. Analyses used univariable and multivariable regression (linear or logistic, as appropriate). The study factor in each analysis was 'QUIT-CAI score percentage'-the percentage of times a registrar performed a QUIT-CAI clinical activity when 'at risk' (i.e. when managing a problem where performing a QUIT-CAI activity was a plausible option). RESULTS: A total of 1265, 1145, and 553 registrars sat Applied Knowledge Test, Key Features Problem, and Objective Structured Clinical Exam examinations, respectively. On multivariable analysis, higher QUIT-CAI score percentages (more questionable activities) were significantly associated with poorer Applied Knowledge Test scores (P = .001), poorer Key Features Problem scores (P = .003), and poorer Objective Structured Clinical Exam scores (P = .005). QUIT-CAI score percentages predicted Royal Australian College of General Practitioner exam failure [odds ratio 1.06 (95% CI 1.00, 1.12) per 1% increase in QUIT-CAI, P = .043]. CONCLUSION: Performing questionable clinical activities predicted poorer performance in the summative Fellowship examinations, thereby validating these examinations as measures of actual clinical performance (by our measure of clinical performance, which is relevant for a licensure/certification examination).


Subject(s)
Certification , Clinical Competence , Educational Measurement , General Practice , Humans , Australia , Clinical Competence/standards , Retrospective Studies , Educational Measurement/methods , General Practice/standards , General Practice/education , Female , Licensure, Medical , Male , Adult , Education, Medical, Graduate
14.
Acad Med ; 99(3): 325-330, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-37816217

ABSTRACT

PURPOSE: The United States Medical Licensing Examination (USMLE) comprises a series of assessments required for the licensure of U.S. MD-trained graduates as well as those who are trained internationally. Demonstration of a relationship between these examinations and outcomes of care is desirable for a process seeking to provide patients with safe and effective health care. METHOD: This was a retrospective cohort study of 196,881 hospitalizations in Pennsylvania over a 3-year period (January 1, 2017 to December 31, 2019) for 5 primary diagnoses: heart failure, acute myocardial infarction, stroke, pneumonia, or chronic obstructive pulmonary disease. The 1,765 attending physicians for these hospitalizations self-identified as family physicians or general internists. A converted score based on USMLE Step 1, Step 2 Clinical Knowledge, and Step 3 scores was available, and the outcome measures were in-hospital mortality and log length of stay (LOS). The research team controlled for characteristics of patients, hospitals, and physicians. RESULTS: For in-hospital mortality, the adjusted odds ratio was 0.94 (95% confidence interval [CI] = 0.90, 0.99; P < .02). Each standard deviation increase in the converted score was associated with a 5.51% reduction in the odds of in-hospital mortality. For log LOS, the adjusted estimate was 0.99 (95% CI = 0.98, 0.99; P < .001). Each standard deviation increase in the converted score was associated with a 1.34% reduction in log LOS. CONCLUSIONS: Better provider USMLE performance was associated with lower in-hospital mortality and shorter log LOS for patients, although the magnitude of the latter is unlikely to be of practical significance. These findings add to the body of evidence that examines the validity of the USMLE licensure program.


Subject(s)
Educational Measurement , Internship and Residency , Humans , United States , Retrospective Studies , Licensure, Medical , Hospitalization , Pennsylvania , Physicians, Family
16.
17.
JAMA Netw Open ; 6(11): e2343697, 2023 Nov 01.
Article in English | MEDLINE | ID: mdl-37966842

ABSTRACT

This cross-sectional study compares the use of telemedicine in states where COVID-19 pandemic­related licensure waivers expired vs states where waivers continued.


Subject(s)
Licensure, Medical , Telemedicine , Telemedicine/legislation & jurisprudence
18.
BMC Med Educ ; 23(1): 788, 2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37875929

ABSTRACT

Pass/fail (P/F) grading has emerged as an alternative to tiered clerkship grading. Systematically evaluating existing literature and surveying program directors (PD) perspectives on these consequential changes can guide educators in addressing inequalities in academia and students aiming to improve their residency applications. In our survey, a total of 1578 unique PD responses (63.1%) were obtained across 29 medical specialties. With the changes to United States Medical Licensure Examination (USMLE), responses showed increased importance of core clerkships with the implementation of Step 2CK cutoffs. PDs believed core clerkship performance was a reliable representation of an applicant's preparedness for residency, particularly in Accreditation Council for Graduate Medical Education's (ACGME)Medical Knowledge and Patient Care and Procedural Skills. PDs disagreed with P/F core clerkships because it more difficult to objectively compare applicants. No statistically significant differences in responses were found in PD preferential selection when comparing applicants from tiered and P/F core clerkship grading systems. If core clerkships adopted P/F scoring, PDs would further increase emphasis on narrative assessment, sub-internship evaluation, reference letters, academic awards, professional development and medical school prestige. In the meta-analysis, of 6 studies from 2,118 participants, adjusted scaled scores with mean difference from an equal variance model from PDs showed residents from tiered clerkship grading systems overall performance, learning ability, work habits, personal evaluations, residency selection and educational evaluation were not statistically significantly different than from residents from P/F systems. Overall, our dual study suggests that while PDs do not favor P/F core clerkships, PDs do not have a selection preference and do not report a difference in performance between applicants from P/F vs. tiered grading core clerkship systems, thus providing fertile grounds for institutions to examine the feasibility of adopting P/F grading for core clerkships.


Subject(s)
Clinical Clerkship , Internship and Residency , Students, Medical , Humans , United States , Educational Measurement , Accreditation , Licensure, Medical
19.
R I Med J (2013) ; 106(8): 31-35, 2023 Sep 01.
Article in English | MEDLINE | ID: mdl-37643340

ABSTRACT

OBJECTIVE: This study aimed to examine the patterns of complaints filed against physicians in Rhode Island, investigate the factors associated with complaint rates and outcomes, and assess the impact of the implementation of a new Framework for Just Culture. METHODS: Complaint data from the Rhode Island Department of Health's complaint tracker and physician licensing database were analyzed for the period of 2018 to 2020. Descriptive and statistical process control analyses were conducted to assess complaint rates, investigation rates, and adverse outcomes. RESULTS: Over the three-year period, 1672 complaints were filed against Rhode Island physicians, with approximately 40% of complaints being opened for investigation. The implementation of the Framework for Just Culture coincided with a sustained decrease in the rate of complaints opened. Failure to meet the minimum standard of care was the most common allegation, and male physicians and those aged 40-50 were more likely to have complaints filed against them. CONCLUSIONS: The study highlights the importance of complaint investigations in upholding standards for medical licensure and clinical competence. The Framework for Just Culture may have influenced the investigation process, resulting in fewer investigations opened without compromising the identification of cases requiring disciplinary action. These findings provide insights into physician accountability and the need for ongoing monitoring and improvement in complaint handling systems.


Subject(s)
Licensure, Medical , Physicians , Humans , Male , Rhode Island/epidemiology , Clinical Competence , Databases, Factual
20.
BMC Med Educ ; 23(1): 543, 2023 Jul 31.
Article in English | MEDLINE | ID: mdl-37525136

ABSTRACT

BACKGROUND: The purpose of this systematic review was to (1) determine the scope of literature measuring USMLE Step 1 and Step 2 CK as predictors or indicators of quality resident performance across all medical specialties and (2) summarize the ability of Step 1 and Step 2 CK to predict quality resident performance, stratified by ACGME specialties, based on available literature. METHODS: This systematic review was designed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [16]. The original search strategy surveyed MEDLINE and was adapted to survey Cochrane Library and Embase. A study was deemed eligible if it provided all three of the following relevant information: (a) Step 1 or Step 2 CK as indicators for (b) resident outcomes in (c) any ACGME accredited specialty training program. RESULTS: A total of 1803 articles were screened from three separate databases. The 92 included studies were stratified by specialty, with Surgery (21.7% [20/92]), Emergency Medicine (13.0% [12/92]), Internal Medicine (10.9% [10/92]), and Orthopedic Surgery (8.7% [8/92]) being the most common. Common resident performance measures included ITE scores, board certification, ACGME milestone ratings, and program director evaluations. CONCLUSIONS: Further studies are imperative to discern the utility of Step 1 and Step 2 CK as predictors of resident performance and as tools for resident recruitment and selection. The results of this systematic review suggest that a scored Step 1 dated prior to January 2022 can be useful as a tool in a holistic review of future resident performance, and that Step 2 CK score performance may be an effective tool in the holistic review process. Given its inherent complexity, multiple tools across many assessment modalities are necessary to assess resident performance comprehensively and effectively.


Subject(s)
Educational Measurement , Internship and Residency , Humans , United States , Educational Measurement/methods , Clinical Competence , Licensure, Medical , Internal Medicine/education
SELECTION OF CITATIONS
SEARCH DETAIL
...