Search | VHL Regional Portal

1.

Reported Pediatrics Milestones (Mostly) Measure Program, Not Learner Performance.

Hu, Kimberly; Hicks, Patricia J; Margolis, Melissa; Carraccio, Carol; Osta, Amanda; Winward, Marcia L; Schwartz, Alan.

Acad Med ; 95(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 59th Annual Research in Medical Education Presentations): S89-S94, 2020 11.

Article in English | MEDLINE | ID: mdl-32769468

ABSTRACT

PURPOSE: Semiannually, U.S. pediatrics residency programs report resident milestone levels to the Accreditation Council for Graduate Medical Education (ACGME). The Pediatrics Milestones Assessment Collaborative (PMAC, consisting of the National Board of Medical Examiners, American Board of Pediatrics, and Association of Pediatric Program Directors) developed workplace-based assessments of 2 inferences: readiness to serve as an intern with a supervisor present (D1) and readiness to care for patients with a supervisor nearby in the pediatric inpatient setting (D2). The authors compared learner and program variance in PMAC scores with ACGME milestones. METHOD: The authors examined sources of variance in PMAC scores and milestones between November 2015 and May 2017 of 181 interns at 8 U.S. pediatrics residency programs using random effects models with program, competency, learner, and program × competency components. RESULTS: Program-related milestone variance was substantial (54% D1, 68% D2), both in comparison to learner milestone variance (22% D1, 14% D2) and program variance in the PMAC scores (12% D1, 10% D2). In contrast, learner variance represented 44% (D1) or 26% (D2) of variance in PMAC scores. Within programs, PMAC scores were positively correlated with milestones for all but one competency. CONCLUSIONS: PMAC assessments provided scores with little program-specific variance and were more sensitive to differences in learners within programs compared with milestones. Milestones reflected greater differences by program than by learner. This may represent program-based differences in intern performance or in use of milestones as a reporting scale. Comparing individual learner milestones without adjusting for programs is problematic.

Subject(s)

Clinical Competence , Internship and Residency/standards , Pediatrics/education , Accreditation , Curriculum , United States

2.

Does Incorporating a Measure of Clinical Workload Improve Workplace-Based Assessment Scores? Insights for Measurement Precision and Longitudinal Score Growth From Ten Pediatrics Residency Programs.

Park, Yoon Soo; Hicks, Patricia J; Carraccio, Carol; Margolis, Melissa; Schwartz, Alan.

Acad Med ; 93(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 57th Annual Research in Medical Education Sessions): S21-S29, 2018 11.

Article in English | MEDLINE | ID: mdl-30365426

ABSTRACT

PURPOSE: This study investigates the impact of incorporating observer-reported workload into workplace-based assessment (WBA) scores on (1) psychometric characteristics of WBA scores and (2) measuring changes in performance over time using workload-unadjusted versus workload-adjusted scores. METHOD: Structured clinical observations and multisource feedback instruments were used to collect WBA data from first-year pediatrics residents at 10 residency programs between July 2016 and June 2017. Observers completed items in 8 subcompetencies associated with Pediatrics Milestones. Faculty and resident observers assessed workload using a sliding scale ranging from low to high; all item scores were rescaled to a 1-5 scale to facilitate analysis and interpretation. Workload-adjusted WBA scores were calculated at the item level using three different approaches, and aggregated for analysis at the competency level. Mixed-effects regression models were used to estimate variance components. Longitudinal growth curve analyses examined patterns of developmental score change over time. RESULTS: On average, participating residents (n = 252) were assessed 5.32 times (standard deviation = 3.79) by different raters during the data collection period. Adjusting for workload yielded better discrimination of learner performance, and higher reliability, reducing measurement error by 28%. Projections in reliability indicated needing up to twice the number of raters when workload-unadjusted scores were used. Longitudinal analysis showed an increase in scores over time, with significant interaction between workload and time; workload also increased significantly over time. CONCLUSIONS: Incorporating a measure of observer-reported workload could improve the measurement properties and the ability to interpret WBA scores.

Subject(s)

Clinical Competence , Internship and Residency , Pediatrics/education , Workload , Educational Measurement , Humans , Psychometrics

3.

A novel workplace-based assessment for competency-based decisions and learner feedback.

Hicks, Patricia J; Margolis, Melissa J; Carraccio, Carol L; Clauser, Brian E; Donnelly, Kathleen; Fromme, H Barrett; Gifford, Kimberly A; Poynter, Sue E; Schumacher, Daniel J; Schwartz, Alan.

Med Teach ; 40(11): 1143-1150, 2018 11.

Article in English | MEDLINE | ID: mdl-29688108

ABSTRACT

BACKGROUND: Increased recognition of the importance of competency-based education and assessment has led to the need for practical and reliable methods to assess relevant skills in the workplace. METHODS: A novel milestone-based workplace assessment system was implemented in 15 pediatrics residency programs. The system provided: (1) web-based multisource feedback (MSF) and structured clinical observation (SCO) instruments that could be completed on any computer or mobile device; and (2) monthly feedback reports that included competency-level scores and recommendations for improvement. RESULTS: For the final instruments, an average of five MSF and 3.7 SCO assessment instruments were completed for each of 292 interns; instruments required an average of 4-8 min to complete. Generalizability coefficients >0.80 were attainable with six MSF observations. Users indicated that the new system added value to their existing assessment program; the need to complete the local assessments in addition to the new assessments was identified as a burden of the overall process. CONCLUSIONS: Outcomes - including high participation rates and high reliability compared to what has traditionally been found with workplace-based assessment - provide evidence for the validity of scores resulting from this novel competency-based assessment system. The development of this assessment model is generalizable to other specialties.

Subject(s)

Competency-Based Education/standards , Educational Measurement/methods , Formative Feedback , Internship and Residency/organization & administration , Workplace/standards , Clinical Competence/standards , Clinical Decision-Making , Educational Measurement/standards , Humans , Internet , Internship and Residency/standards , Pediatrics/education , Reproducibility of Results

4.

A multi-source feedback tool for measuring a subset of Pediatrics Milestones.

Schwartz, Alan; Margolis, Melissa J; Multerer, Sara; Haftel, Hilary M; Schumacher, Daniel J.

Med Teach ; 38(10): 995-1002, 2016 Oct.

Article in English | MEDLINE | ID: mdl-27027428

ABSTRACT

BACKGROUND: The Pediatrics Milestones Assessment Pilot employed a new multisource feedback (MSF) instrument to assess nine Pediatrics Milestones among interns and subinterns in the inpatient context. OBJECTIVE: To report validity evidence for the MSF tool for informing milestone classification decisions. METHODS: We obtained MSF instruments by different raters per learner per rotation. We present evidence for validity based on the unified validity framework. RESULTS: One hundred and ninety two interns and 41 subinterns at 18 Pediatrics residency programs received a total of 1084 MSF forms from faculty (40%), senior residents (34%), nurses (22%), and other staff (4%). Variance in ratings was associated primarily with rater (32%) and learner (22%). The milestone factor structure fit data better than simpler structures. In domains except professionalism, ratings by nurses were significantly lower than those by faculty and ratings by other staff were significantly higher. Ratings were higher when the rater observed the learner for longer periods and had a positive global opinion of the learner. Ratings of interns and subinterns did not differ, except for ratings by senior residents. MSF-based scales correlated with summative milestone scores. CONCLUSION: We obtain moderately reliable MSF ratings of interns and subinterns in the inpatient context to inform some milestone assignments.

Subject(s)

Clinical Competence/standards , Educational Measurement/standards , Formative Feedback , Internship and Residency , Pediatrics/standards , Competency-Based Education , Educational Measurement/methods , Factor Analysis, Statistical , Faculty , Humans , Nurses , Pediatrics/education , Psychometrics , Societies, Medical

5.

The Pediatrics Milestones Assessment Pilot: Development of Workplace-Based Assessment Content, Instruments, and Processes.

Hicks, Patricia J; Margolis, Melissa; Poynter, Sue E; Chaffinch, Christa; Tenney-Soeiro, Rebecca; Turner, Teri L; Waggoner-Fountain, Linda; Lockridge, Robin; Clyman, Stephen G; Schwartz, Alan.

Acad Med ; 91(5): 701-9, 2016 05.

Article in English | MEDLINE | ID: mdl-26735520

ABSTRACT

PURPOSE: To report on the development of content and user feedback regarding the assessment process and utility of the workplace-based assessment instruments of the Pediatrics Milestones Assessment Pilot (PMAP). METHOD: One multisource feedback instrument and two structured clinical observation instruments were developed and refined by experts in pediatrics and assessment to provide evidence for nine competencies based on the Pediatrics Milestones (PMs) and chosen to inform residency program faculty decisions about learners' readiness to serve as pediatric interns in the inpatient setting. During the 2012-2013 PMAP study, 18 U.S. pediatric residency programs enrolled interns and subinterns. Faculty, residents, nurses, and other observers used the instruments to assess learner performance through direct observation during a one-month rotation. At the end of the rotation, data were aggregated for each learner, milestone levels were assigned using a milestone classification form, and feedback was provided to learners. Learners and site leads were surveyed and/or interviewed about their experience as participants. RESULTS: Across the sites, 2,338 instruments assessing 239 learners were completed by 630 unique observers. Regarding end-of-rotation feedback, 93% of learners (128/137) agreed the assessments and feedback "helped me understand how those with whom I work perceive my performance," and 85% (117/137) agreed they were "useful for constructing future goals or identifying a developmental path." Site leads identified several benefits and challenges to the assessment process. CONCLUSIONS: PM-based instruments used in workplace-based assessment provide a meaningful and acceptable approach to collecting evidence of learner competency development. Learners valued feedback provided by PM-based assessment.

Subject(s)

Clinical Competence/standards , Education, Medical, Graduate/standards , Internship and Residency/standards , Pediatrics/education , Education, Medical, Graduate/organization & administration , Feedback , Humans , Internship and Residency/organization & administration , Pediatrics/organization & administration , Pediatrics/standards , Pilot Projects , United States

6.

Validity considerations in the assessment of professionalism.

Clauser, Brian E; Margolis, Melissa J; Holtman, Matthew C; Katsufrakis, Peter J; Hawkins, Richard E.

Adv Health Sci Educ Theory Pract ; 17(2): 165-81, 2012 May.

Article in English | MEDLINE | ID: mdl-20094911

ABSTRACT

During the last decade, interest in assessing professionalism in medical education has increased exponentially and has led to the development of many new assessment tools. Efforts to validate the scores produced by tools designed to assess professionalism have lagged well behind the development of these tools. This paper provides a structured framework for collecting evidence to support the validity of assessments of professionalism. The paper begins with a short history of the concept of validity in the context of psychological assessment. It then describes Michael Kane's approach to validity as a structured argument. The majority of the paper then focuses on how Kane's framework can be applied to assessments of professionalism. Examples are provided from the literature, and recommendations for future investigation are made in areas where the literature is deficient.

Subject(s)

Education, Medical/methods , Mental Disorders/diagnosis , Professional Competence , Professional Role , Psychological Tests , Reproducibility of Results , Humans

7.

Validity evidence for USMLE examination cut scores: results of a large-scale survey.

Margolis, Melissa J; Clauser, Brian E; Winward, Marcia; Dillon, Gerard F.

Acad Med ; 85(10 Suppl): S93-7, 2010 Oct.

Article in English | MEDLINE | ID: mdl-20881714

ABSTRACT

PURPOSE: This research examined the credibility of the cut scores used to make pass/fail decisions on United States Medical Licensing Examination (USMLE) Step 1, Step 2 Clinical Knowledge, and Step 3. METHOD: Approximately 15,000 members of nine constituency groups were asked their opinions about (1) current initial and ultimate fail rates and (2) the highest acceptable, lowest acceptable, and optimal initial and ultimate fail rates. RESULTS: Initial fail rates were generally viewed as appropriate; more variability was associated with ultimate fail rates. Actual fail rates for each examination across recent years fell within the range that respondents considered acceptable. CONCLUSIONS: Results provide important evidence to support the appropriateness of the cut scores used to make classification decisions for USMLE examinations. This evidence is viewed as part of the overall validity argument for decisions based on USMLE scores.

Subject(s)

Clinical Medicine/education , Educational Measurement/statistics & numerical data , Licensure, Medical , Education, Medical, Undergraduate , Educational Status , Humans , Surveys and Questionnaires , United States

8.

Constructing a validity argument for the mini-Clinical Evaluation Exercise: a review of the research.

Hawkins, Richard E; Margolis, Melissa J; Durning, Steven J; Norcini, John J.

Acad Med ; 85(9): 1453-61, 2010 Sep.

Article in English | MEDLINE | ID: mdl-20736673

ABSTRACT

PURPOSE: The mini-Clinical Evaluation Exercise (mCEX) is increasingly being used to assess the clinical skills of medical trainees. Existing mCEX research has typically focused on isolated aspects of the instrument's reliability and validity. A more thorough validity analysis is necessary to inform use of the mCEX, particularly in light of increased interest in high-stakes applications of the methodology. METHOD: Kane's (2006) validity framework, in which a structured argument is developed to support the intended interpretation(s) of assessment results, was used to evaluate mCEX research published from 1995 to 2009. In this framework, evidence to support the argument is divided into four components (scoring, generalization, extrapolation, and interpretation/decision), each of which relates to different features of the assessment or resulting scores. The strength and limitations of the reviewed research were identified in relation to these components, and the findings were synthesized to highlight overall strengths and weaknesses of existing mCEX research. RESULTS: The scoring component yielded the most concerns relating to the validity of mCEX score interpretations. More research is needed to determine whether scoring-related issues, such as leniency error and high interitem correlations, limit the utility of the mCEX for providing feedback to trainees. Evidence within the generalization and extrapolation components is generally supportive of the validity of mCEX score interpretations. CONCLUSIONS: Careful evaluation of the circumstances of mCEX assessment will help to improve the quality of the resulting information. Future research should address issues of rater selection, training, and monitoring which can impact rating accuracy.

Subject(s)

Clinical Competence , Education, Medical, Graduate/methods , Educational Measurement/methods , Internal Medicine/education , Internship and Residency , Medical History Taking/standards , Physical Examination/standards , Humans , Psychometrics , Reproducibility of Results

9.

The generalizability of documentation scores from the USMLE Step 2 Clinical Skills examination.

Clauser, Brian E; Harik, Polina; Margolis, Melissa J; Mee, Janet; Swygert, Kimberly; Rebbecchi, Thomas.

Acad Med ; 83(10 Suppl): S41-4, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18820498

ABSTRACT

BACKGROUND: This research examined various sources of measurement error in the documentation score component of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills examination. METHOD: A generalizability theory framework was employed to examine the documentation ratings for 847 examinees who completed the USMLE Step 2 Clinical Skills examination during an eight-day period in 2006. Each patient note was scored by two different raters allowing for a persons-crossed-with-raters-nested-in-cases design. RESULTS: The results suggest that inconsistent performance on the part of raters makes a substantially greater contribution to measurement error than case specificity. Double scoring the notes significantly increases precision. CONCLUSIONS: The results provide guidance for improving operational scoring of the patient notes. Double scoring of the notes may produce an increase in the precision of measurement equivalent to that achieved by lengthening the test by more than 50%. The study also cautions researchers that when examining sources of measurement error, inappropriate data-collection designs may result in inaccurate inferences.

Subject(s)

Clinical Competence , Licensure, Medical , Cohort Studies , Communication , Generalization, Psychological , Humans , Observer Variation , Patient Simulation , Physical Examination , Physician-Patient Relations , Reproducibility of Results , Sensitivity and Specificity , United States

10.

A comparison of alternative item weighting strategies on the data gathering component of a clinical skills performance assessment.

Kahraman, Nilufer; Clauser, Brian E; Margolis, Melissa J.

Acad Med ; 83(10 Suppl): S72-5, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18820506

ABSTRACT

BACKGROUND: Checklist scores used to produce the data gathering score on the Step 2 CS examination are currently weighted using an algorithm based on expert judgment about the importance of the item. The present research was designed to compare this approach with alternative weighting strategies. METHOD: Scores from 21,140 examinees who took the United States Medical Licensing Examination Step 2 between May 2006 and February 2007 were subjected to five weighting models: (1) a regression weights model, (2) a factor loading weights model, (3) a standardized response model, (4) an equal weights model, and (5) the operational expert-judgment weights model. RESULTS: Alternative weighting procedures may have a significant impact on the reliability and validity of checklist scores. CONCLUSIONS: The results suggest that the current weighting procedure is useful, and the regression-based model holds promise for practical application. The regression-based model produces scores that are more reliable than those produced by the current procedure and more strongly related to the external criteria.

Subject(s)

Algorithms , Clinical Competence/statistics & numerical data , Licensure, Medical , Models, Statistical , Cohort Studies , Factor Analysis, Statistical , Humans , Judgment , Psychometrics , Reproducibility of Results , Retrospective Studies , United States

11.

Collecting validity evidence for an assessment of professionalism: findings from think-aloud interviews.

Mazor, Kathleen M; Canavan, Colleen; Farrell, Margaret; Margolis, Melissa J; Clauser, Brian E.

Acad Med ; 83(10 Suppl): S9-12, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18820511

ABSTRACT

BACKGROUND: This study investigated whether participants' subjective reports of how they assigned ratings on a multisource feedback instrument provide evidence to support interpreting the resulting scores as objective, accurate measures of professional behavior. METHOD: Twenty-six participants completed think-aloud interviews while rating students, residents, or faculty members they had worked with previously. The items rated included 15 behavioral items and one global item. RESULTS: Participants referred to generalized behaviors and global impressions six times as often as specific behaviors, rated observees in the absence of information necessary to do so, relied on indirect evidence about performance, and varied in how they interpreted items. CONCLUSIONS: Behavioral change becomes difficult to address if it is unclear what behaviors raters considered when providing feedback. These findings highlight the importance of explicitly stating and empirically investigating the assumptions that underlie the use of an observational assessment tool.

Subject(s)

Internship and Residency , Interviews as Topic , Pediatrics/education , Professional Competence , Social Behavior , Feedback, Psychological , Humans , Knowledge of Results, Psychological , Observer Variation , Qualitative Research , Reproducibility of Results

12.

Sequence effects in the United States Medical Licensing Examination (USMLE) step 2 clinical skills (cs) examination.

Ramineni, Chaitanya; Harik, Polina; Margolis, Melissa J; Clauser, Brian E; Swanson, David B; Dillon, Gerard F.

Acad Med ; 82(10 Suppl): S101-4, 2007 Oct.

Article in English | MEDLINE | ID: mdl-17895671

ABSTRACT

BACKGROUND: Systematic trends in examinee performance across the testing day (sequence effects) could indicate that artifacts of the testing situation have an impact on scores. This research investigated the presence of sequence effects for United States Medical Licensing Exam (USMLE) Step 2 clinical skills (CS) examination components. METHOD: Data from Step 2 CS examinees were analyzed using analysis of covariance and hierarchical linear modeling procedures. RESULTS: Sequence was significant for three of the components; communication and interpersonal skills, data gathering, and documentation. A significant gender x sequence interaction was found for two components. CONCLUSIONS: The presence of sequence effects suggests that scores on early cases are influenced by factors that are unrelated to the proficiencies of interest. More research is needed to fully understand these effects.

Subject(s)

Clinical Competence/standards , Educational Measurement/methods , Faculty, Medical , Licensure, Medical , Students, Medical , Communication , Female , Humans , Interpersonal Relations , Linear Models , Male , Sex Factors , United States

13.

Evaluation of missing data in an assessment of professional behaviors.

Mazor, Kathleen; Clauser, Brian E; Holtman, Matthew; Margolis, Melissa J.

Acad Med ; 82(10 Suppl): S44-7, 2007 Oct.

Article in English | MEDLINE | ID: mdl-17895689

ABSTRACT

BACKGROUND: The National Board of Medical Examiners is currently developing the Assessment of Professional Behaviors, a multisource feedback (MSF) tool intended for formative use with medical students and residents. This study investigated whether missing responses on this tool can be considered random; evidence that missing values are not random would suggest response bias, a significant threat to score validity. METHOD: Correlational analyses of pilot data (N = 2,149) investigated whether missing values were systematically related to global evaluations of observees. RESULTS: The percentage of missing items was correlated with global evaluations of observees; observers answered more items for preferred observees compared with nonpreferred observees. CONCLUSIONS: Missing responses on this MSF tool seem to be nonrandom and are instead systematically related to global perceptions of observees. Further research is needed to determine whether modifications to the items, the instructions, or other components of the assessment process can reduce this effect.

Subject(s)

Behavior , Clinical Competence/standards , Data Collection/statistics & numerical data , Education, Medical , Program Evaluation/statistics & numerical data , Students, Medical , Surveys and Questionnaires , Humans , Observer Variation , Pilot Projects , Retrospective Studies

14.

Relationships among subcomponents of the USMLE Step 2 Clinical Skills Examination, the Step 1, and the Step 2 Clinical Knowledge Examinations.

Harik, Polina; Clauser, Brian E; Grabovsky, Irina; Margolis, Melissa J; Dillon, Gerard F; Boulet, John R.

Acad Med ; 81(10 Suppl): S21-4, 2006 Oct.

Article in English | MEDLINE | ID: mdl-17001128

ABSTRACT

BACKGROUND: This research examined relationships between and among scores from the United States Medical Licensing Examination (USMLE) Step 1, Step 2 Clinical Knowledge (CK), and subcomponents of the Step 2 Clinical Skills (CS) examination. METHOD: Correlations and failure rates were produced for first-time takers who tested during the first year of Step 2 CS Examination administration (June 2004 to July 2005). RESULTS: True-score correlations were high between patient note (PN) and data gathering (DG), moderate between communication and interpersonal skills and DG, and low between the remaining score pairs. There was little overlap between examinees failing Step 2 CK and the different components of Step 2 CS. CONCLUSION: Results suggest that combining DG and PN scores into a single composite score is reasonable and that relatively little redundancy exists between Step 2 CK and CS scores.

Subject(s)

Clinical Competence , Interpersonal Relations , Language , Licensure, Medical , Communication , Foreign Medical Graduates , Humans , United States

15.

Use of the mini-clinical evaluation exercise to rate examinee performance on a multiple-station clinical skills examination: a validity study.

Margolis, Melissa J; Clauser, Brian E; Cuddy, Monica M; Ciccone, Andrea; Mee, Janet; Harik, Polina; Hawkins, Richard E.

Acad Med ; 81(10 Suppl): S56-60, 2006 Oct.

Article in English | MEDLINE | ID: mdl-17001137

ABSTRACT

BACKGROUND: Multivariate generalizability analysis was used to investigate the performance of a commonly used clinical evaluation tool. METHOD: Practicing physicians were trained to use the mini-Clinical Skills Examination (CEX) rating form to rate performances from the United States Medical Licensing Examination Step 2 Clinical Skills examination. RESULTS: Differences in rater stringency made the greatest contribution to measurement error; more raters rating each examinee, even on fewer occasions, could enhance score stability. Substantial correlated error across the competencies suggests that decisions about one scale unduly influence those on others. CONCLUSIONS: Given the appearance of a halo effect across competencies, score interpretations that assume assessment of distinct dimensions of clinical performance should be made with caution. If the intention is to produce a single composite score by combining results across competencies, the presence of these effects may be less critical.

Subject(s)

Clinical Competence/standards , Educational Measurement/methods , Physical Examination/methods , Software , Analysis of Variance , Humans , Interviews as Topic

16.

Assessing the validity of the USMLE step 2 clinical knowledge examination through an evaluation of its clinical relevance.

Cuddy, Monica M; Dillon, Gerard F; Clauser, Brian E; Holtzman, Kathleen Z; Margolis, Melissa J; McEllhenney, Suzanne M; Swanson, David B.

Acad Med ; 79(10 Suppl): S43-5, 2004 Oct.

Article in English | MEDLINE | ID: mdl-15383386

ABSTRACT

PURPOSE: To assess the validity of the USMLE Step 2 Clinical Knowledge (CK) examination by addressing the degree to which experts view item content as clinically relevant and appropriate for Step 2 CK. METHOD: Twenty-seven experts were asked to complete three survey questions related to the clinical relevance and appropriateness of 150 Step 2 CK multiple-choice questions. Percentages, reliability estimates, and correlation coefficients were calculated and ordinary least squares regression was used. RESULTS: Results showed that 92% of expert judgments indicated the item content was clinically relevant, 90% indicated the content was appropriate for Step 2 CK, and 85% indicated the content was used in clinical practice. The regression indicated that more difficult items and more frequently used items are considered more appropriate for Step 2 CK. CONCLUSIONS: Results suggest that the majority of item content is clinically relevant and appropriate, thus providing validation support for Step 2 CK.

Subject(s)

Clinical Competence , Education, Medical , Educational Measurement/standards , Licensure, Medical , Clinical Competence/standards , Education, Medical/standards , Educational Measurement/methods , Expert Testimony , Female , Humans , Judgment , Male , Reproducibility of Results , United States

17.

Scoring the computer-based case simulation component of USMLE Step 3: a comparison of preoperational and operational data.

Margolis, Melissa J; Clauser, Brian E; Harik, Polina.

Acad Med ; 79(10 Suppl): S62-4, 2004 Oct.

Article in English | MEDLINE | ID: mdl-15383392

ABSTRACT

PURPOSE: Operational USMLE(TM) computer-based case simulation results were examined to determine the extent to which rater reliability and regression model performance met expectations based on preoperational data. METHOD: Operational data resulted from Step 3 examinations given between 1999 and 2004. Plots were produced using reliability and multiple correlation coefficients. RESULTS: Operational testing reliabilities increased over the four years but were lower than the preoperational reliability. Multiple correlation coefficient results are somewhat superior to the results reported during the preoperational period and suggest that the operational scoring algorithms have been relatively consistent. CONCLUSIONS: Changes in the rater population, changes in the rating task, and enhancements to the training procedures are several factors that can explain the identified differences between preoperational and operational results. The present findings have important implications for test development and test validity.

Subject(s)

Clinical Competence , Computer Simulation , Education, Medical , Educational Measurement/methods , Licensure, Medical , Algorithms , Educational Measurement/statistics & numerical data , Humans , Observer Variation , Regression Analysis , Reproducibility of Results

18.

An examination of the relationship between clinical skills examination performance and performance on USMLE Step 2.

Muller, Eric S; Harik, Polina; Margolis, Melissa; Clauser, Brian; McKinley, Danette; Boulet, John R.

Acad Med ; 78(10 Suppl): S27-9, 2003 Oct.

Article in English | MEDLINE | ID: mdl-14557087

ABSTRACT

PURPOSE: To examine the relationship between performance on a large-scale clinical skills examination (CSE) and a high-stakes multiple-choice examination. METHOD: Two samples were used: (1) 6,372 first-taker international medical graduates (IMGs); and (2) 858 fourth-year U.S. medical students. Ninety-seven percent of IMGs and 70% of U.S. students had completed Step 2. Correlations were calculated, scatter plots produced, and regression lines estimated. RESULTS: Correlations between CSE and Step 2 ranged from .16 to .38. The observed relationship between scores confirms that CSE score information is not redundant with MCQ score information. This result was consistent across samples. CONCLUSIONS: Results suggest that the CSE assesses proficiencies distinct from those assessed by current USMLE components and therefore provides evidence justifying its inclusion in the medical licensure process.

Subject(s)

Clinical Competence/statistics & numerical data , Educational Measurement , Licensure, Medical/statistics & numerical data , Foreign Medical Graduates/statistics & numerical data , Humans , Regression Analysis , Students, Medical/statistics & numerical data , United States

19.

Analysis of the relationship between score components on a standardized patient clinical skills examination.

Margolis, Melissa J; Clauser, Brian E; Swanson, David B; Boulet, John R.

Acad Med ; 78(10 Suppl): S68-71, 2003 Oct.

Article in English | MEDLINE | ID: mdl-14557100

ABSTRACT

PURPOSE: This work investigated the reliability of and relationships between individual case and composite scores on a standardized patient clinical skills examination. METHOD: Four hundred ninety two fourth-year U.S. medical students received three scores [data gathering (DG), interpersonal skills (IPS), and written communication (WC)] for each of 10 standardized patient cases. mGENOVA software was used for all analyses. RESULTS: Estimated generalizability coefficients were 0.69, 0.80, and 0.70 for the DG, IPS, and WC scores, respectively. The universe-score correlation between DG and WC was high (.83); those for DG/IPS and IPS/WC were not as strong (0.51 and 0.37, respectively). Task difficulty appears to be modestly but positively related across the three scores. Correlations between the person-by-task effects for DG/IPS and DG/WC were positive yet modest. The estimated generalizability coefficient for a ten-case test using an equally weighted composite DG/WC score was 0.78. CONCLUSIONS: This work allows for interpretation of correlations between (1) proficiencies measured by multiple scores and (2) sources of error that affect those scores as well as for estimation of the reliability of composite scores. Results have important implications for test construction and test validity.

Subject(s)

Licensure, Medical/statistics & numerical data , Medical History Taking/statistics & numerical data , Physical Examination/statistics & numerical data , Physician-Patient Relations , Clinical Competence/statistics & numerical data , Humans , Multivariate Analysis , Students, Medical , United States

20.

Evaluation of an automated procedure for scoring patient notes as part of a clinical skills examination.

Swygert, Kimberly; Margolis, Melissa; King, Ann; Siftar, Time; Clyman, Stephen; Hawkins, Richard; Clauser, Brian.

Acad Med ; 78(10 Suppl): S75-7, 2003 Oct.

Article in English | MEDLINE | ID: mdl-14557102

ABSTRACT

PROBLEM STATEMENT AND BACKGROUND: The purpose of the present study was to examine the extent to which an automated scoring procedure that emulates expert ratings with latent semantic analysis could be used to score the written patient note component of the proposed clinical skills examination (CSE). METHOD: Human ratings for four CSE cases collected in 2002 were compared to automated holistic scores and to regression-based scores based on automated holistic and component scores. RESULTS AND CONCLUSIONS: Regression-based scores account for approximately half of the variance in the human ratings and are more highly correlated with the ratings than the scores produced from the automated algorithm. Implications of this study and suggestions for follow-up research are discussed.

Subject(s)

Educational Measurement/statistics & numerical data , Medical History Taking , Physical Examination , Software , Algorithms , Clinical Competence , Humans , Licensure, Medical , Linear Models , United States

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL