Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 12 de 12
Medical Education ; : 367-375, 2023.
Article in Japanese | WPRIM | ID: wpr-1007092


This paper describes test theory, which is the theoretical foundation of learner assessment, in relation to its application in Common Achievement Tests. To be specific, classical test theory and the reliability coefficients, generalizability theory, and item response theory are taken up. In classical test theory, the observed score X is expressed as the sum of a true-score component T and an error component E. The ratio of the true-score variance to the observed score variance defines the reliability coefficient. Generalizability theory is an extension of the notion of reliability in classical test theory. Item response theory overcomes the limitations of classical test theory and can express the properties of items (difficulty and discrimination) and examinees’ abilities separately.

Article in Chinese | WPRIM | ID: wpr-1025277


Objective To analyze and evaluate the items of the QLICD-CPHD(V2.0)scale for chronic pulmonary heart disease using classical test theory(CTT)and item response theory(IRT).Methods 184 patients with chronic pulmonary heart disease were investigated by QLICD-CPHD(V2.0)scale.The items of the QLICD-CPHD(V2.0)scale was evaluated by some statistical methods based on CTT including correlation coefficient method,variance method,factor analysis method and Cronbach's α coefficient method.Meanwhile,Samejima hierarchical response model of item response theory was utilized to calculate the difficulty,information and differentiation coefficient content of each item in the scale.Results CTT results showed that 7 items failed meet at least three or more statistical requirements,including 6 items in the general module and 1 item in the disease specific module.IRT results showed that the range of item differentiation was 1.18~1.44,which was suitable.The difficulty coefficient increases monotonously with the increase of the difficulty level(B1→B4),and some items exceed the standard value range.The average information amount of each item ranges from 0.185~0.576.Conclusion By CTT and IRT analysis,most items of the QLICD-CPHD(V2.0)scale have good performance and good differentiation,but a few items still need further revision.

Rev. colomb. psicol ; 29(1): 87-103, ene.-jun. 2020. tab, graf
Article in English | LILACS-Express | LILACS | ID: biblio-1115628


Abstract The present study examines the psychometric properties of the mental health scale for children used in the 2015 Colombian Mental Health Survey. To do so, a nationally representative sample of 2,727 children is used Mage=8.99; range=7-11, with reports from their main caregivers regarding 26 mental health problem symptoms taken from the Reporting Questionnaire for Children (RQC), Child Behavior Checklist (CBCL), and the Brief Screening and Diagnostic Questionnaire (CBTD). Classical test theory and factor analysis were conducted to analyze the classical location and information of each item, along with the dimensionality, reliability, and convergent validity of the scale. Item Response Theory (IRT) was used in order to estimate theoretically invariant item parameters for location and information. Findings reveal that the mental health scale for children has adequate psychometric properties for its use in Colombia. Furthermore, IRT analyses reveals a set of items that maximize information and that may be used in future administrations when more efficiency is warranted.

Resumen El estudio examina las propiedades psicométricas de la escala de salud mental para niños utilizada en la Encuesta Nacional de Salud Mental Colombia del 2015. Se utilizó una muestra representativa a nivel nacional de 2,727 niños M age =8.99; rango=7-11, con informes proporcionados por sus cuidadores principales respecto de los síntomas de 26 problemas de salud mental tomados del Cuestionario de Reporte para Niños (RQC), el Inventario de Comportamiento de Niños (CBCL) y el Cuestionario Breve de Tamizaje y Diagnóstico (CBTD). Se emplearon la Teoría Clásica de los Tests y el análisis factorial para analizar la localización clásica y la información de cada ítem, así como la dimensionalidad, la confiabilidad y la validez convergente de la escala. Además, se utilizó la Teoría de Respuesta al Ítem (TRI) para calcular los parámetros de ítem teóricamente invariables para localización e información. Los resultados muestran que la escala de salud mental para niños tiene propiedades psicométricas adecuadas para su uso en Colombia. Además, los análisis TRI revelan un conjunto de ítems que maximizan la información y pueden ser usados en administraciones futuras en las que se requiera mayor eficiencia.

Resumo Este estudo analisa as propriedades psicométricas da escala de saúde mental para crianças utilizada na Pesquisa Nacional de Saúde Mental Colombiana de 2015. Foi utilizada uma amostra representativa no âmbito nacional de 2,727 crianças m age =8.99' faixa etária=7-11, com informações fornecidas por seus cuidadores principais a respeito dos sintomas de 26 problemas de saúde mental tomados do Reporting Questionnaire for Children, do Inventário de Comportamentos de Crianças e Adolescentes (Child Behavior Checklist) e do Questionário Breve de Rastreamento e Diagnóstico. Foram utilizadas a Teoria Clássica dos Testes e a análise fatorial para analisar a localização clássica e a informação de cada item, bem como a dimensionalidade, a confiabilidade e a validade convergente da escala. Além disso, a Teoria de Resposta ao Item (TRI) para calcular o padrão de cada item teoricamente invariável para localização e informação. Os resultados indicam que a escala de saúde mental para crianças tem propriedades psicométricas adequadas para seu uso na Colômbia. Ainda, as análises com a TRI revelam um conjunto de itens que maximizam a informação e podem ser usados futuramente com mais eficácia.

Psicol. reflex. crit ; 27(4): 670-678, Oct-Dec/2014. tab, graf
Article in English | LILACS, INDEXPSI | ID: lil-728843


Researchers dealing with the task of estimating locations of individuals on continuous latent variables may rely on several statistical models described in the literature. However, weighting costs and benefits of using one specific model over alternative models depends on empirical information that is not always clearly available. Therefore, the aim of this simulation study was to compare the performance of seven popular statistical models in providing adequate latent trait estimates in conditions of items difficulties targeted at the sample mean or at the tails of the latent trait distribution. Results suggested an overall tendency of models to provide more accurate estimates of true latent scores when using items targeted at the sample mean of the latent trait distribution. Rating Scale Model, Graded Response Model, and Weighted Least Squares Mean- and Variance-adjusted Confirmatory Factor Analysis yielded the most reliable latent trait estimates, even when applied to inadequate items for the sample distribution of the latent variable. These findings have important implications concerning some popular methodological practices in Psychology and related areas. (AU)

Pesquisadores interessados em estimar a localização de indivíduos em variáveis latentes contínuas podem se beneficiar de diversos modelos estatísticos disponíveis na literatura. Entretanto, ponderar os custos e os benefícios de usar um modelo em detrimento de outros depende de informações empíricas que nem sempre estão diretamente disponíveis. Em virtude disso, o objetivo deste estudo foi comparar o desempenho de sete modelos estatísticos populares quanto a proporcionar adequadas estimativas de traço latente em condições de itens com dificuldades condizentes com a distribuição latente amostral versus apenas condizentes com as caudas dessa distribuição. Os resultados sugeriram uma tendência de todos os modelos de proporcionar estimativas mais precisas ao serem usados itens adequados para o nível de traço latente da amostra. Os modelos da Teoria de Resposta ao Item Rating Scale e Graded Response e a análise fatorial confirmatória com estimação Weighted Least Squares Mean- and Variance-adjusted forneceram as estimativas mais fidedignas de traço latente, mesmo quando os itens utilizados, de fato, correspondiam ao nível latente de poucos casos da amostra. Os resultados possuem importantes implicações no que diz respeito a algumas práticas metodológicas populares na Psicologia e em áreas próximas. (AU)

Psychometrics , Statistics as Topic , Factor Analysis, Statistical
Summa psicol. UST ; 11(2): 103-113, 2014. tab, graf
Article in Spanish | LILACS | ID: lil-783369


Se aplicó el Modelo de Respuesta Graduada (MRG) de la Teoría de Respuesta al ítem (TRI) y la Teoría Clásica de Test (TCT) al análisis de ítems de una escala de Confianza para la Matemática (Abal, 2013). La prueba mide la capacidad percibida por un estudiante universitario para operar eficazmente con símbolos y fórmulas, aprender y aprobar la asignatura matemática u otras afines. La prueba consta de 8 ítems con formato de respuesta Likert de 6 opciones. Participaron 1875 estudiantes de Psicología de la Universidad de Buenos Aires, Argentina. Se verificó la condición de unidimensionalidad requerida por el MRG. El ajuste del MRG fue satisfactorio para todos los ítems. El análisis clásico incluyó el estudio de las frecuencias de respuesta, estadísticos descriptivos del ítem y correlación ítem-test corregida. El coeficiente de confiabilidad marginal de la TRI fue de .91 y el alfa de Cronbach fue .90. Se encontraron correlaciones elevadas entre: a) la media del ítem y los parámetros de localización centrales del MRG, b) la correlación ítem-test corregida y los parámetros de discriminación y c) entre los escalamientos de los individuos realizados desde la TRI y la TCT. Estos resultados aportan evidencias de validez basadas en la estructura interna del instrumento...

The Graded Response Model (GRM) of Item Response Theory (IRT) and Classical Test Theory (CTT) were applied to the analysis of items from a scale of Confidence in Mathematics (Abal, 2013). This scale measures the ability perceived by university student to operate effectively with symbols and formulas, to solve problematic situations, to learn and pass mathematics or related subjects. The scale comprises 8 items in polytomous response format (6-point Likert-type). The sample was made up by 1875 students of the Psychology school of Buenos Aires University, Argentina. The unidimensionality assumption required by the GRM was confirmed. The GRM fitted to data satisfactorily for all items. Location and discrimination parameters showed predictable values. Classical item analysis involved the examination of response frequencies, item descriptive statistics and corrected item-test correlations. The marginal reliability coefficient obtained from IRT was .91 and Cronbach’s alpha was .90. High correlations were found between: a) item means and central location parameters of GRM, b) corrected item-test correlations and discrimination parameters, and c) IRT and CTT individual scores. The finding provides validity evidences based on the internal structure of scale...

Humans , Male , Adolescent , Adult , Female , Young Adult , Middle Aged , Self Efficacy , Trust , Students/psychology , Mathematics/education , Psychological Tests , Argentina , Models, Psychological , Reproducibility of Results
Pensam. psicol ; 11(2): 19-38, jul.-dic. 2013. ilus, tab
Article in Spanish | LILACS, COLNAL | ID: lil-708977


Objetivo. Esta investigación se propuso examinar las propiedades psicométricas de la escala Psychological Entitlement Scale (PES) en el contexto local desde la Teoría clásica de los tests (TCT) y la Teoría de respuesta al ítem (TRI). Método. Participaron 402 estudiantes universitarios con una media de edad de 22.77 años (DS = 4.85), de ambos géneros (61.9% mujeres) y de distintas carreras universitarias y 324 personas de 18 a 65 años de edad (M = 32.77, DS = 10.71), de ambos géneros (56.2% mujeres) y de diferentes niveles socioeconómicos, residentes en la ciudad de Córdoba, seleccionados de manera accidental. Todos ellos fueron evaluados con la PES y la Escala de dominancia triple. Resultados. Desde la TCT, análisis factoriales exploratorios y confirmatorios evidenciaron que ocho de los nueve ítems que comprendía la escala original mostraron una estructura unifactorial, con adecuadas cargas factoriales y/o pesos de regresión. Además, se observaron valores aceptables de consistencia interna. Las personas clasificadas como prosociales presentaron valores significativamente menores en la PES que las personas individualistas y competitivas (r-p² = 0.04). Desde la TRI, si bien se obtuvo un buen ajuste global al modelo de escalas de clasificación de un parámetro, el análisis de ítems evidenció índices inaceptables para un ítem y categorías poco informativas. Conclusión. Aunque se obtuvo evidencia sobre adecuadas propiedades psicométricas desde la TCT, análisis desde la TRI sugieren realizar ajustes al instrumento, en particular, reagrupar categorías de respuesta que resultaron poco informativas.

Objective. The aim of this research was to examine the psychometric properties of the Psychological Entitlement Scale (PES) in the local context using Classical Test Theory (CTT) and Item Response Theory (IRT). Method. Participants were 402 university students with a mean age of 22.77 years (SD = 4.85), both genders (61.9% female), and from different university careers, and 324 residents of the city of Córdoba, aged 18 to 65 years (M = 32.77, SD = 10.71), both genders (56.2% women), with different socio-economic backgrounds. All participants were selected at random, and evaluated with PES and Triple Dominance Scale measures. Results. From the TCT, exploratory and confirmatory factor analyses indicated that eight of the nine items comprising the original scale showed an unifactorial structure with adequate factor loadings and/or regression weights. Also, acceptable values of internal consistency were observed. Individuals classified as pro-social presented significantly lower values in the PES than individualistic and competitive people (small effect size). From the IRT, the rating scale model showed a good global fit to the scale model, although the item analysis indicated unacceptable indices for one item, and some categories which were not very informative. Conclusion. Although TCT results demonstrated good psychometric properties, analysis from the IRT suggested adjustments to the instrument, in particular, the regrouping of response categories which were not very informative alone.

Escopo. Esta pesquisa propõe examinar as propriedades psicométricas desde a escada Psychological Entitlement Scale (PES) no contexto local desde a Teoria clássica dos testes (TCT) e a Teoria de resposta ao item (TRI). Metodologia. Participaram 402 estudantes universitários com uma media de idade de 22.77 anos (DS = 4.85,), de ambos gêneros (61.9% mulheres) e de diferentes carreiras universitárias e 324 pessoas de 18 a 65 anos de idade (M = 32.77, DS = 10.71), de ambos géneros (61.9% mulheres) e de diferentes níveis sócio-económicos, residentes na cidade de Córdoba, selecionados de maneira acidental. Todos eles foram avaliados com a PES e a Escada de dominancia tripla. Resultados. Desde a TCT, análises fatoriais exploratórios e confirmatórios evidenciaram que oito dos nove itens que compreendia a escada original mostraram uma estrutura unifatorial, com adequadas cargas fatoriais e/ou peso de regressão. Além, foram observados valores aceitáveis, de consistência interna. As pessoas classificadas como prosociais presentaram valores significativamente menores na PES que as pessoas individualistas e competitivas (r-p² = 0.04). Desde a TRI, embora foi obtido um bom ajuste global ao modelo de escadas de classificação de um parâmetro o análises de itens evidenciou índices inaceitáveis para um item e categorias pouco informativas. Conclusão. Embora foi obtida evidencia sobre adequadas propriedades psicométricas desde a TCT, análises desde a TRI sugerem fazer ajustes ao instrumento, em particular, reagrupar categorias de resposta que resultaram pouco informativas.

Adult , Psychometrics
Psicol. reflex. crit ; 26(2): 241-250, 2013. ilus, tab
Article in Portuguese | LILACS | ID: lil-680120


No século XX, o desenvolvimento e avaliação das propriedades psicométricas dos testes se embasou principalmente na Teoria Clássica dos Testes (TCT). Muitos testes são longos e redundantes, com medidas influenciáveis pelas características da amostra dos indivíduos avaliados durante seu desenvolvimento, sendo algumas destas limitações consequências do uso da TCT. A Teoria de Resposta ao Item (TRI) surgiu como uma possível solução para algumas limitações da TCT, melhorando a qualidade da avaliação da estrutura dos testes. Neste texto comparamos criticamente as características da TCT e da TRI como métodos para avaliação das propriedades psicométricas dos testes. São discutidas as vantagens e limitações de cada método...

In the 20th century, development and evaluation of psychometric properties of tests was mainly based on the Classical Test Theory (CTT). Many tests are long and redundant, with measures influenced by the characteristics of the sample of the individuals being evaluated. Some of the limitations are a consequence of the use of the CTT. The Item Response Theory (IRT) has been proposed as a solution to some limitations of the CTT, improving the quality of assessment of the tests structure. In this paper we critically compare the characteristics of CTT and IRT methods in determining the psychometric properties of tests. We discuss the advantages and limitations of each method...

Aval. psicol ; 11(2): 297-307, ago. 2012. ilus
Article in Portuguese | LILACS | ID: lil-688393


Este artigo revisita textos clássicos em psicometria e apresenta os fundamentos matemáticos da Teoria Clássica dos Testes. Aborda o modelo matemático da análise fatorial, o modelo linear clássico, a derivação do índice de precisão e dos tipos de cálculo do coeficiente de precisão, o erro padrão da medida, o equacionamento da validade com a análise fatorial e, por último, a análise de itens. O texto interessa àqueles que queiram ampliar seu conhecimento nos conceitos de psicometria, entendendo de onde surgem as principais fórmulas que usamos na prática psicométrica de análise de testes e escalas.

This paper revisits the classic texts in psychometrics and presents the mathematical foundations of the classical test theory. It discusses the mathematical model of factor analysis, the classical linear model, the derivation of the reliability and types of calculation of the reliability coefficient, the standard error of measurement, the integration of validity with factor analysis and, finally, item analysis procedures. The text concerns those who want to deepen their knowledge in the concepts of psychometrics, understanding the origin of the main formulas that we use when doing psychometric analysis of tests and scales.

Este artículo repasa los textos clásicos en psicometría y presenta los fundamentos matemáticos de la teoría clásica de los testes. Explica el modelo matemático de análisis factorial, el modelo lineal clásico, la derivación del índice de precisión y los tipos de cálculo del coeficiente de precisión, el error estándar de medición, el ecuacionamento de la validez con el análisis factorial y, por último, el análisis de ítems. El texto es de interés para aquellos que desean ampliar sus conocimientos sobre los conceptos de la psicometría, la comprensión de donde surgen las principales fórmulas que se presentan en la práctica psicométrica de tests y escalas.

Factor Analysis, Statistical , Psychometrics
Psico USF ; 16(2): 151-161, maio-ago. 2011. ilus, tab
Article in Portuguese | LILACS | ID: lil-612836


O presente estudo teve como objetivo analisar as qualidades psicométricas da Escala Baptista de Depressão (Versão Adulto) - EBADEP-A, com base na Teoria da Resposta ao Item - TRI e na Teoria Clássica dos Testes - TCT. Inicialmente, sobre os parâmetros de ajuste ao modelo, tanto no que concerne aos itens quanto às pessoas, os valores foram considerados de boa adequação, com percentual de desajuste considerado baixo. Em relação à precisão, tanto o alfa de Cronbach quanto o índice gerado pelo modelo de Rasch foram considerados excelentes. O estudo do funcionamento diferencial apresentou 17 itens com viés de resposta, sendo 11 favorecendo o grupo feminino e 6 o masculino. Quanto às análises com base na TCT, foi realizada ANOVA para análise dos grupos critério, sendo que a EBADEP-A foi capaz de discriminar os grupos de não-depressivos, universitários, psiquiátricos e depressivos. Estes resultados foram considerados como evidências de validade de construto e critério, respectivamente, complementando as diversas evidências já encontradas para a escala.

This study aimed to analyze the psychometric qualities of the Escala Baptista de Depressão (Versão Adulto) - EBADEP-A based on Item Response Theory - IRT and Classical Test Theory - TCT. Initially, adjustment parameters on the model, both regarding to items and to people, were considered good fit values, with a low percentage of mismatches. In relation to the reliability, both Cronbach's alpha and the Rasch index were considered great. The study of differential functioning presented 17 items with response bias, 11 favoring the female group and 6 the male one. As for the analysis based on TCT, it was performed an ANOVA to analyze the criterion groups, and the EBADEP-A was able to discriminate the non-depressed, college students, psychiatric and depressive groups. These results were considered as evidence of construct validity and criterion, respectively, complementing the variety of evidence ever found for the scale.

Humans , Male , Female , Adolescent , Young Adult , Middle Aged , Aged, 80 and over , Depression/psychology , Psychometrics , Reproducibility of Results , Weights and Measures
Article in Korean | WPRIM | ID: wpr-177481


PURPOSE: The objectives of this study were: 1) to analyze Clinical Performance Examination(CPX) items using item response theory(IRT) and classical test theory(CTT) and 2) to discuss how to apply and interpret these results in order to improve the quality of CPX items. In addition, we intended to explore statistical procedures in order to merge examination data from several different medical schools. METHODS: The subject of the study was the 2005 CPX examination data from 10 medical schools located in Seoul and the Kyunggi province. For merging data from ten different medical schools, Levene's test for homogeneity of variances was used. Homogeneous group selection was conducted based on ANOVA or Kruskal-Wallis' test and Tukey's multiple comparisons appropriately. The generalized partial credit model was applied to analyze polytomous items and the 2-parameter logistic model was used to analyze dichotomous items. RESULTS: Data from 8 medical schools were incorporated into the analysis. The result of the discrimination index by IRT was different from that of CTT in both polytomous and dichotomous items. Discrimination index from IRT tended to be lower than that of CTT. Difficulty index of dichotomous items of two models was correlated well with each other. However, for polytomous items, IRT model provided more information than CCT. CONCLUSION: We discovered that the CPX items were mostly easy in terms of difficulty index, and the result from IRT and CCT model did not correlated well in the discrimination index. IRT may provide more detailed information for polytomous items, but the checklist and criteria of scoring system should be cautiously reviewed.

Checklist , Discrimination, Psychological , Logistic Models , Schools, Medical , Seoul
Medical Education ; : 3-9, 2005.
Article in Japanese | WPRIM | ID: wpr-369912


Data from the first trial of the computer-based nationwide common achievement test in medicine, carried out from February through July in 2002, were analyzed to evaluate the applicability of the item-response theory. The trial test was designed to cover 6 areas of the core curriculum and included a total of 2791 items. For each area, 3 to 40 items were chosen randomly and administered to 5693 students in the fourth to sixth years; the responses of 5676 of these students were analyzed with specifically designed computer systems. Each student was presented with 100 items. The itemresponse patterns were analyzed with a 3-parameter logistic model (item discrimination, item difficulty, and guessing parameter). The main findings were: 1) Item difficulty and the percentage of correct answers were strongly correlated (r=-0.969to-0.982). 2) Item discrimination and the point-biserial correlation were moderately strongly correlated (r=0.304 to 0.511). 3) The estimated abilities and the percentage of correct answers were strongly correlated (r=0.810 to 0.945). 4) The mean ability increased with school year. 5) The correlation coefficients among the 6 curriculum area ability scores were less than 0.6. Because the nationwide common achievement test was designed to randomly present items to each student, the item-response theory can be used to adjust the differences among test sets. The first trial test was designed without considering the item-response theory, but the second trial test was administered with a design better suited for comparison. Results of an analysis of the second trial will be reported soon.

Article in Korean | WPRIM | ID: wpr-90113


PURPOSE: In 2002, extended-matching type (R-type) items were introduced to the Korean Medical Licensing Examination. To evaluate the usability of R-type items, the results of the Korean Medical Licensing Examination in 2002 and 2003 were analyzed based on item types and knowledge levels. METHODS: Item parameters, such as difficulty and discrimination indexes, were calculated using the classical test theory. The item parameters were compared across three item types and three knowledge levels. RESULTS: The values of R-type item parameters were higher than those of A- or K-type items. There was no significant difference in item parameters according to knowledge level, including recall, interpretation, and problem solving. The reliability of R-type items exceeded 0.99. With the R-type, an increasing number in correct answers was associated with a decreasing difficulty index. CONCLUSION: The introduction of R-type items is favorable from the perspective of item parameters. However, an increase in the number of correct answers in pick 'n'-type questions results in the items being more difficult to solve.

Discrimination, Psychological , Education, Medical , Licensure , Problem Solving