Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
1.
Psychometrika ; 84(3): 749-771, 2019 09.
Article in English | MEDLINE | ID: mdl-30511327

ABSTRACT

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an "absolute change in theta" (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1-17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.


Subject(s)
Cognition/physiology , Computer Simulation/statistics & numerical data , Psychometrics/methods , Algorithms , Bias , Dimensional Measurement Accuracy , Humans
2.
Psychometrika ; 83(1): 223-254, 2018 03.
Article in English | MEDLINE | ID: mdl-27796763

ABSTRACT

Statistical methods for identifying aberrances on psychological and educational tests are pivotal to detect flaws in the design of a test or irregular behavior of test takers. Two approaches have been taken in the past to address the challenge of aberrant behavior detection, which are (1) modeling aberrant behavior via mixture modeling methods, and (2) flagging aberrant behavior via residual based outlier detection methods. In this paper, we propose a two-stage method that is conceived of as a combination of both approaches. In the first stage, a mixture hierarchical model is fitted to the response and response time data to distinguish normal and aberrant behaviors using Markov chain Monte Carlo (MCMC) algorithm. In the second stage, a further distinction between rapid guessing and cheating behavior is made at a person level using a Bayesian residual index. Simulation results show that the two-stage method yields accurate item and person parameter estimates, as well as high true detection rate and low false detection rate, under different manipulated conditions mimicking NAEP parameters. A real data example is given in the end to illustrate the potential application of the proposed method.


Subject(s)
Behavior , Data Interpretation, Statistical , Problem Solving , Psychometrics/methods , Academic Performance , Algorithms , Bayes Theorem , Computer Simulation , Diagnosis, Computer-Assisted/methods , Humans , Markov Chains , Monte Carlo Method , Pattern Recognition, Automated/methods , Reaction Time
3.
Br J Math Stat Psychol ; 69(3): 291-315, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27435032

ABSTRACT

There has recently been much interest in computerized adaptive testing (CAT) for cognitive diagnosis. While there exist various item selection criteria and different asymptotically optimal designs, these are mostly constructed based on the asymptotic theory assuming the test length goes to infinity. In practice, with limited test lengths, the desired asymptotic optimality may not always apply, and there are few studies in the literature concerning the optimal design of finite items. Related questions, such as how many items we need in order to be able to identify the attribute pattern of an examinee and what types of initial items provide the optimal classification results, are still open. This paper aims to answer these questions by providing non-asymptotic theory of the optimal selection of initial items in cognitive diagnostic CAT. In particular, for the optimal design, we provide necessary and sufficient conditions for the Q-matrix structure of the initial items. The theoretical development is suitable for a general family of cognitive diagnostic models. The results not only provide a guideline for the design of optimal item selection procedures, but also may be applied to guide item bank construction.


Subject(s)
Cognition Disorders/diagnosis , Data Interpretation, Statistical , Diagnosis, Computer-Assisted/methods , Educational Measurement/methods , Models, Statistical , Surveys and Questionnaires , Algorithms , Computer Simulation , Humans , Psychometrics/methods , Reproducibility of Results , Sensitivity and Specificity
4.
Appl Psychol Meas ; 39(7): 525-538, 2015 Oct.
Article in English | MEDLINE | ID: mdl-29881024

ABSTRACT

This research focuses on developing item-level fit checking procedures in the context of diagnostic classification models (DCMs), and more specifically for the "Deterministic Input; Noisy 'And' gate" (DINA) model. Although there is a growing body of literature discussing model fit checking methods for DCM, the item-level fit analysis is not adequately discussed in literature. This study intends to take an initiative to fill in this gap. Two approaches are proposed, one stems from classical goodness-of-fit test statistics coupled with the Expectation-Maximization algorithm for model estimation, and the other is the posterior predictive model checking (PPMC) method coupled with the Markov chain Monte Carlo estimation. For both approaches, the chi-square statistic and a power-divergence index are considered, along with Stone's method for considering uncertainty in latent attribute estimation. A simulation study with varying manipulated factors is carried out. Results show that both approaches are promising if Stone's method is imposed, but the classical goodness-of-fit approach has a much higher detection rate (i.e., proportion of misfit items that are correctly detected) than the PPMC method.

SELECTION OF CITATIONS
SEARCH DETAIL
...