Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Language
Publication year range
1.
Psychometrika ; 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37930558

ABSTRACT

The key assumption of conditional independence of item responses given latent ability in item response theory (IRT) models is addressed for multistage adaptive testing (MST) designs. Routing decisions in MST designs can cause patterns in the data that are not accounted for by the IRT model. This phenomenon relates to quasi-independence in log-linear models for incomplete contingency tables and impacts certain types of statistical inference based on assumptions on observed and missing data. We demonstrate that generalized residuals for item pair frequencies under IRT models as discussed by Haberman and Sinharay (J Am Stat Assoc 108:1435-1444, 2013. https://doi.org/10.1080/01621459.2013.835660 ) are inappropriate for MST data without adjustments. The adjustments are dependent on the MST design, and can quickly become nontrivial as the complexity of the routing increases. However, the adjusted residuals are found to have satisfactory Type I errors in a simulation and illustrated by an application to real MST data from the Programme for International Student Assessment (PISA). Implications and suggestions for statistical inference with MST designs are discussed.

2.
Appl Psychol Meas ; 44(1): 17-32, 2020 Jan.
Article in English | MEDLINE | ID: mdl-31853156

ABSTRACT

Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case, Fisher information (FI) is commonly used as the statistical target to obtain parallelism. In the multidimensional case, however, FI is a multidimensional matrix, which complicates its use as a statistical target. Previous research addressing this problem focused on item selection criteria for multidimensional computerized adaptive testing (MCAT). Yet these selection criteria are not directly transferable to the assembly of linear parallel test forms. To bridge this gap the authors derive different statistical targets, based on either FI or the Kullback-Leibler (KL) divergence, that can be applied in MILP models to assemble multidimensional parallel test forms. Using simulated item pools and an item pool based on empirical items, the proposed statistical targets are compared and evaluated. Promising results with respect to the KL-based statistical targets are presented and discussed.

3.
Psychometrika ; 83(1): 109-131, 2018 03.
Article in English | MEDLINE | ID: mdl-29164449

ABSTRACT

We propose a generalization of the speed-accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615-633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation-maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets.


Subject(s)
Models, Statistical , Psychometrics/methods , Academic Performance , Algorithms , Computer Simulation , Data Interpretation, Statistical , Humans , Mathematical Concepts , Reaction Time
4.
Br J Math Stat Psychol ; 70(2): 317-345, 2017 May.
Article in English | MEDLINE | ID: mdl-28474769

ABSTRACT

We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures.


Subject(s)
Models, Statistical , Psychometrics/methods , Reaction Time , Surveys and Questionnaires , Diffusion , Humans , Models, Psychological , Psychometrics/statistics & numerical data , Reproducibility of Results
5.
Appl Psychol Meas ; 40(3): 163-179, 2016 May.
Article in English | MEDLINE | ID: mdl-29881046

ABSTRACT

Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.

6.
Psicol. educ. (Madr.) ; 20(2): 109-115, dic. 2014. graf, ilus, tab
Article in English | IBECS | ID: ibc-130788

ABSTRACT

We investigate methods for studying learning progressions in English language arts using data from scenario-based assessments. Particularly, our interest lies in the empirical recovery of learning progressions in argumentation for middle school students. We collected data on three parallel assessment forms that consist of scenario-based task sets with multiple item formats, where students randomly took two of the three assessments. We fitted several item response theory models, and used model-based measures to classify students into levels of the argumentation learning progression. Although there were some differences in difficulty between parallel tasks, good agreement was found among the classifications of the parallel forms. Overall, we managed to recover empirically the order of the levels in the argumentation learning progression as they were assigned to tasks of the assessments by the theoretical framework


En este trabajo se investigan métodos para estudiar las progresiones de aprendizaje de competencias de lecto-escritura utilizando evaluaciones basadas en escenarios. En particular, nos interesa poder replicar las progresiones en el aprendizaje de la capacidad para argumentar en estudiantes de enseñanza secundaria obligatoria. Se han recogido datos aplicando tres formas paralelas de una prueba que consiste en conjuntos de tareas basadas en escenarios con preguntas de distinto formato; cada estudiante respondió a dos de estas tres formas, que fueron asignadas aleatoriamente a cada uno de ellos. Se han ajustado a los datos varios modelos de teoría de respuesta al ítem y se han utilizado medidas basadas en esta teoría para clasificar a los estudiantes en los niveles correspondientes de la progresión en el aprendizaje de la capacidad para argumentar. Aunque se han detectado algunas diferencias en las tareas de las formas paralelas, se ha encontrado un grado razonable de acuerdo entre las clasificaciones realizadas en base a las formas paralelas de la prueba. En general, se ha replicado empíricamente el orden de los niveles de la progresión en el aprendizaje de la argumentación, tal y como fueron asignados los niveles a las tareas de la prueba en el marco teórico


Subject(s)
Humans , Educational Measurement/methods , Learning , Reading , Writing , Aptitude Tests , Psychology, Educational/methods , Empirical Research
SELECTION OF CITATIONS
SEARCH DETAIL
...