Pesquisa | Portal Regional da BVS (teste)

Judges' perception of candidates' organization and communication, in relation to oral certification examination ratings.

Houston, James E; Myford, Carol M.

Acad Med ; 84(11): 1603-9, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19858824

RESUMO

PURPOSE: To determine (1) whether judges differed in the levels of severity they exercised when rating candidates' performance in an oral certification exam, (2) to what extent candidates' clinical competence ratings were related to their organization/communication ratings, and (3) to what extent clinical competence ratings could predict organization/communication ratings. METHOD: Six hundred eighty-four physicians participated in a medical specialty board's 2002 oral examination. Ninety-nine senior members of the medical specialty served as judges, rating candidates' performances. Candidates' clinical competence ratings were analyzed using multifaceted Rasch measurement to investigate judge severity. A Pearson correlation was calculated to examine the relationship between ratings of clinical competence and organization/communication. Logistic regression was used to determine to what extent clinical competence ratings predicted organization/communication ratings. RESULTS: There were about three statistically distinct strata of judge severity; judges were not interchangeable. There was a moderately strong relationship between the two sets of candidate ratings. Higher clinical competence ratings were associated with an organization/communication rating of acceptable, whereas lower clinical competence ratings were associated with an organization/communication rating of unacceptable. The judges' clinical competence ratings correctly predicted 61.9% of the acceptable and 88.3% of the unacceptable organization/communication ratings. Overall, the clinical competence ratings correctly predicted 80% of the organization/communication ratings. CONCLUSIONS: The close association between the two sets of ratings was possibly due to a "halo" effect. Several explanations for this relationship were explored, and the authors considered the implications for their understanding of how judges carry out this complex rating task.

Assuntos

Certificação , Competência Clínica/normas , Comunicação , Tomada de Decisões , Percepção Social , Conselhos de Especialidade Profissional/normas , Avaliação Educacional/normas , Humanos , Illinois , Modelos Logísticos , Psicometria

Comparison of single- and double-assessor scoring designs for the assessment of accomplished teaching.

Engelhard, George; Myford, Carol M.

J Appl Meas ; 10(1): 52-69, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19299885

RESUMO

This article is based on a more extensive research report (Engelhard, Myford and Cline, 2000) prepared for the National Board for Professional Teaching Standards (NBPTS) concerning the Early Childhood/Generalist and Middle Childhood/Generalist assessment systems. The report is available from the Educational Testing Service (ETS). An earlier version of the article was presented at the American Educational Research Association Conference in New Orleans in 2000. We would like to acknowledge the helpful advice of Mike Linacre regarding the use of the FACETS computer program and the assistance of Fred Cline in analyzing these data. The material contained in this article is based on work supported by the NBPTS. Any opinions, findings, conclusions, and recommendations expressed herein are those of the authors and do not necessarily reflect the views of the NBPTS, Emory University, ETS, or the University of Illinois at Chicago.

Assuntos

Estudos de Avaliação como Assunto , Competência Profissional/normas , Ensino/normas , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Competência Profissional/estatística & dados numéricos

Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents.

Iramaneerat, Cherdsak; Myford, Carol M; Yudkowsky, Rachel; Lowenstein, Tali.

Adv Health Sci Educ Theory Pract ; 14(4): 575-94, 2009 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-18985427

RESUMO

The investigators used evidence based on response processes to evaluate and improve the validity of scores on the Patient-Centered Communication and Interpersonal Skills (CIS) Scale for the assessment of residents' communication competence. The investigators retrospectively analyzed the communication skills ratings of 68 residents at the University of Illinois at Chicago (UIC). Each resident encountered six standardized patients (SPs) portraying six cases. SPs rated the performance of each resident using the CIS Scale--an 18-item rating instrument asking for level of agreement on a 5-category scale. A many-faceted Rasch measurement model was used to determine how effectively each item and scale on the rating instrument performed. The analyses revealed that items were too easy for the residents. The SPs underutilized the lowest rating category, making the scale function as a 4-category rating scale. Some SPs were inconsistent when assigning ratings in the middle categories. The investigators modified the rating instrument based on the findings, creating the Revised UIC Communication and Interpersonal Skills (RUCIS) Scale--a 13-item rating instrument that employs a 4-category behaviorally anchored rating scale for each item. The investigators implemented the RUCIS Scale in a subsequent communication skills OSCE for 85 residents. The analyses revealed that the RUCIS Scale functioned more effectively than the CIS Scale in several respects (e.g., a more uniform distribution of ratings across categories, and better fit of the items to the measurement model). However, SPs still rarely assigned ratings in the lowest rating category of each scale.

Assuntos

Comunicação , Internato e Residência , Relações Interpessoais , Assistência Centrada no Paciente , Relações Médico-Paciente , Feminino , Indicadores Básicos de Saúde , Humanos , Masculino , Pesquisa , Estudos Retrospectivos

Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement.

Iramaneerat, Cherdsak; Yudkowsky, Rachel; Myford, Carol M; Downing, Steven M.

Adv Health Sci Educ Theory Pract ; 13(4): 479-93, 2008 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-17310306

RESUMO

An Objective Structured Clinical Examination (OSCE) is an effective method for evaluating competencies. However, scores obtained from an OSCE are vulnerable to many potential measurement errors that cases, items, or standardized patients (SPs) can introduce. Monitoring these sources of errors is an important quality control mechanism to ensure valid interpretations of the scores. We describe how one can use generalizability theory (GT) and many-faceted Rasch measurement (MFRM) approaches in quality control monitoring of an OSCE. We examined the communication skills OSCE of 79 residents from one Midwestern university in the United States. Each resident performed six communication tasks with SPs, who rated the performance of each resident using 18 5-category rating scale items. We analyzed their ratings with generalizability and MFRM studies. The generalizability study revealed that the largest source of error variance besides the residual error variance was SPs/cases. The MFRM study identified specific SPs/cases and items that introduced measurement errors and suggested the nature of the errors. SPs/cases were significantly different in their levels of severity/difficulty. Two SPs gave inconsistent ratings, which suggested problems related to the ways they portrayed the case, their understanding of the rating scale, and/or the case content. SPs interpreted two of the items inconsistently, and the rating scales for two items did not function as 5-category scales. We concluded that generalizability and MFRM analyses provided useful complementary information for monitoring and improving the quality of an OSCE.

Assuntos

Comunicação , Avaliação Educacional/métodos , Medicina Interna/educação , Internato e Residência , Controle de Qualidade , Adulto , Distribuição de Qui-Quadrado , Competência Clínica , Educação de Pós-Graduação em Medicina , Feminino , Humanos , Masculino , Simulação de Paciente

Detecting and measuring rater effects using many-facet Rasch measurement: Part II.

Myford, Carol M; Wolfe, Edward W.

J Appl Meas ; 5(2): 189-227, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-15064538

RESUMO

The purpose of this two-part paper is to introduce researchers to the many-facet Rasch measurement (MFRM) approach for detecting and measuring rater effects. In Part II of the paper, researchers will learn how to use the Facets (Linacre, 2001) computer program to study five effects: leniency/severity, central tendency, randomness, halo, and differential leniency/severity. As we introduce each effect, we operationally define it within the context of a MFRM approach, specify the particular measurement model(s) needed to detect it, identify group- and individual-level statistical indicators of the effect, and show output from a Facets analysis, pinpointing the various indicators and explaining how to interpret each one. At the close of the paper, we describe other statistical procedures that have been used to detect and measure rater effects to help researchers become aware of important and influential literature on the topic and to gain an appreciation for the diversity of psychometric perspectives that researchers bring to bear on their work. Finally, we consider future directions for research in the detection and measurement of rater effects.

Assuntos

Modelos Psicológicos , Psicometria/métodos , Psicometria/estatística & dados numéricos , Humanos , Variações Dependentes do Observador

Detecting and measuring rater effects using many-facet Rasch measurement: part I.

Myford, Carol M; Wolfe, Edward W.

J Appl Meas ; 4(4): 386-422, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-14523257

RESUMO

The purpose of this two-part paper is to introduce researchers to the many-facet Rasch measurement (MFRM) approach for detecting and measuring rater effects. The researcher will learn how to use the Facets (Linacre, 2001) computer program to study five effects: leniency/severity, central tendency, randomness, halo, and differential leniency/severity. Part 1 of the paper provides critical background and context for studying MFRM. We present a catalog of rater effects, introducing effects that researchers have studied over the last three-quarters of a century in order to help readers gain a historical perspective on how those effects have been conceptualized. We define each effect and describe various ways the effect has been portrayed in the research literature. We then explain how researchers theorize that the effect impacts the quality of ratings, pinpoint various indices they have used to measure it, and describe various strategies that have been proposed to try to minimize its impact on the measurement of ratees. The second half of Part 1 provides conceptual and mathematical explanations of many-facet Rasch measurement, focusing on how researchers can use MFRM to study rater effects. First, we present the many-facet version of Andrich's (1978) rating scale model and identify questions about a rating operation that researchers can address using this model. We then introduce three hybrid MFRM models, explain the conceptual distinctions among them, describe how they differ from the rating scale model, and identify questions about a rating operation that researchers can address using these hybrid models.

Assuntos

Modelos Estatísticos , Pesquisa/normas , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Pesquisa/estatística & dados numéricos

When raters disagree, then what: examining a third-rating discrepancy resolution procedure and its utility for identifying unusual patterns of ratings.

Myford, Carol M; Wolfe, Edward W.

J Appl Meas ; 3(3): 300-24, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-12147915

RESUMO

The purpose of this study was to examine a procedure for identifying and resolving discrepancies in ratings. We sought to determine to what extent the third-rater adjudication procedure employed in scoring the Test of Spoken English (TSE) successfully identified all anomalous ratings. We analyzed data from the April 1997 TSE scoring session using FACETS, a rating scale analysis computer program. The results suggest that, while it is important for an assessment program to identify cases in which there is obvious disagreement in the ratings assigned and have a policy to resolve those disagreements, implementing a discrepancy resolution procedure is not sufficient in and of itself for quality control monitoring. Often times, there are other anomalous ratings that discrepancy resolution procedures may miss. Fit analysis can provide a valuable adjunct to a discrepancy resolution procedure, flagging suspect rating profiles in need of expert review before a final score report is issued.

Assuntos

Avaliação Educacional/normas , Testes de Linguagem/estatística & dados numéricos , Adulto , Avaliação Educacional/métodos , Feminino , Humanos , Modelos Lineares , Masculino , Computação Matemática , Variações Dependentes do Observador , Controle de Qualidade , Software/normas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA