Search | VHL Regional Portal

1.

Efficient Corrections for Standardized Person-Fit Statistics.

Gorney, Kylie; Sinharay, Sandip; Eckerly, Carol.

Psychometrika ; 89(2): 569-591, 2024 06.

Article in English | MEDLINE | ID: mdl-38558053

ABSTRACT

Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.

Subject(s)

Psychometrics , Humans , Psychometrics/methods , Models, Statistical , Computer Simulation , Data Interpretation, Statistical

2.

Remarks from the New Editor-in-Chief.

Sinharay, Sandip.

Psychometrika ; 89(1): 1-3, 2024 03.

Article in English | MEDLINE | ID: mdl-38565792

Subject(s)

Periodicals as Topic , Humans , Psychometrics/methods , Editorial Policies

3.

Assessment of fit of the time-varying dynamic partial credit model using the posterior predictive model checking method.

Castro-Alvarez, Sebastian; Sinharay, Sandip; Bringmann, Laura F; Meijer, Rob R; Tendeiro, Jorge N.

Br J Math Stat Psychol ; 2024 Feb 21.

Article in English | MEDLINE | ID: mdl-38379504

ABSTRACT

Several new models based on item response theory have recently been suggested to analyse intensive longitudinal data. One of these new models is the time-varying dynamic partial credit model (TV-DPCM; Castro-Alvarez et al., Multivariate Behavioral Research, 2023, 1), which is a combination of the partial credit model and the time-varying autoregressive model. The model allows the study of the psychometric properties of the items and the modelling of nonlinear trends at the latent state level. However, there is a severe lack of tools to assess the fit of the TV-DPCM. In this paper, we propose and develop several test statistics and discrepancy measures based on the posterior predictive model checking (PPMC) method (PPMC; Rubin, The Annals of Statistics, 1984, 12, 1151) to assess the fit of the TV-DPCM. Simulated and empirical data are used to study the performance of and illustrate the effectiveness of the PPMC method.

4.

Using item scores and response times in person-fit assessment.

Gorney, Kylie; Sinharay, Sandip; Liu, Xiang.

Br J Math Stat Psychol ; 77(1): 151-168, 2024 Feb.

Article in English | MEDLINE | ID: mdl-37667833

ABSTRACT

The use of joint models for item scores and response times is becoming increasingly popular in educational and psychological testing. In this paper, we propose two new person-fit statistics for such models in order to detect aberrant behaviour. The first statistic is computed by combining two existing person-fit statistics: one for the item scores, and one for the item response times. The second statistic is computed directly using the likelihood function of the joint model. Using detailed simulations, we show that the empirical null distributions of the new statistics are very close to the theoretical null distributions, and that the new statistics tend to be more powerful than several existing statistics for item scores and/or response times. A real data example is also provided using data from a licensure examination.

Subject(s)

Models, Statistical , Psychological Tests , Humans , Reaction Time , Likelihood Functions

5.

Targeted Double Scoring of Performance Tasks Using a Decision-Theoretic Approach.

Sinharay, Sandip; Johnson, Matthew S; Wang, Wei; Miao, Jing.

Appl Psychol Meas ; 47(2): 155-163, 2023 Mar.

Article in English | MEDLINE | ID: mdl-36875293

ABSTRACT

Targeted double scoring, or, double scoring of only some (but not all) responses, is used to reduce the burden of scoring performance tasks for several mastery tests (Finkelman, Darby, & Nering, 2008). An approach based on statistical decision theory (e.g., Berger, 1989; Ferguson, 1967; Rudner, 2009) is suggested to evaluate and potentially improve upon the existing strategies in targeted double scoring for mastery tests. An application of the approach to data from an operational mastery test shows that a refinement of the currently used strategy would lead to substantial cost savings.

6.

An Investigation Into the Impact of Test Session Disruptions for At-Home Test Administrations.

Castellano, Katherine E; Sinharay, Sandip; Hao, Jiangang; Li, Chen.

Appl Psychol Meas ; 47(1): 76-82, 2023 Jan.

Article in English | MEDLINE | ID: mdl-36425287

ABSTRACT

In response to the closures of test centers worldwide due to the COVID-19 pandemic, several testing programs offered large-scale standardized assessments to examinees remotely. However, due to the varying quality of the performance of personal devices and internet connections, more at-home examinees likely suffered "disruptions" or an interruption in the connectivity to their testing session compared to typical test-center administrations. Disruptions have the potential to adversely affect examinees and lead to fairness or validity issues. The goal of this study was to investigate the extent to which disruptions impacted performance of at-home examinees using data from a large-scale admissions test. Specifically, the study involved comparing the average test scores of the disrupted examinees with those of the non-disrupted examinees after weighting the non-disrupted examinees to resemble the disrupted examinees along baseline characteristics. The results show that disruptions had a small negative impact on test scores on average. However, there was little difference in performance between the disrupted and non-disrupted examinees after removing records of the disrupted examinees who were unable to complete the test.

7.

The Standardized S-X ² Statistic for Assessing Item Fit.

Han, Zhuangzhuang; Sinharay, Sandip; Johnson, Matthew S; Liu, Xiang.

Appl Psychol Meas ; 47(1): 3-18, 2023 Jan.

Article in English | MEDLINE | ID: mdl-36425289

ABSTRACT

The S-X 2 statistic (Orlando & Thissen, 2000) is popular among researchers and practitioners who are interested in the assessment of item fit. However, the statistic suffers from the Chernoff-Lehmann problem (Chernoff & Lehmann, 1954) and hence does not have a known asymptotic null distribution. This paper suggests a modified version of the S-X 2 statistic that is based on the modified Rao-Robson χ 2 statistic (Rao & Robson, 1974). A simulation study and a real data analyses demonstrate that the use of the modified statistic instead of the S-X 2 statistic would lead to fewer items being flagged for misfit.

8.

Estimating Probabilities of Passing for Examinees With Incomplete Data in Mastery Tests.

Sinharay, Sandip.

Educ Psychol Meas ; 82(3): 580-609, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35444341

ABSTRACT

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests. However, there is a lack of research on this estimation problem. The goal of this article is to suggest two new approaches-one each based on classical test theory and item response theory-for estimating the probabilities of passing of the examinees with incomplete data on mastery tests. The two approaches are demonstrated to have high accuracy and negligible misclassification rates.

9.

The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level.

Peng, Luyao; Sinharay, Sandip.

Educ Psychol Meas ; 82(1): 177-200, 2022 Feb.

Article in English | MEDLINE | ID: mdl-34992311

ABSTRACT

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

10.

The Lack of Robustness of a Statistic Based on the Neyman-Pearson Lemma to Violations of Its Underlying Assumptions.

Sinharay, Sandip.

Appl Psychol Meas ; 46(1): 19-39, 2022 Jan.

Article in English | MEDLINE | ID: mdl-34898745

ABSTRACT

Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (NPL; e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices (OAIs) of Levine and Drasgow (1988) and is the most powerful statistic for detecting item preknowledge when the assumptions underlying the statistic hold for the data (e.g., Belov, 2016Belov, 2016; Drasgow et al., 1996). This paper demonstrated using real data analysis that one assumption underlying the statistic of Drasgow et al. (1996) is often likely to be violated in practice. This paper also demonstrated, using simulated data, that the statistic is not robust to realistic violations of its underlying assumptions. Together, the results from the real data and the simulations demonstrate that the statistic of Drasgow et al. (1996) may not always be the optimum statistic in practice and occasionally has smaller power than another statistic for detecting preknowledge on a known set of items, especially when the assumptions underlying the former statistic do not hold. The findings of this paper demonstrate the importance of keeping in mind the assumptions underlying and the limitations of any statistic or method.

11.

Detection of Item Preknowledge Using Response Times.

Sinharay, Sandip.

Appl Psychol Meas ; 44(5): 376-392, 2020 Jul.

Article in English | MEDLINE | ID: mdl-32879537

ABSTRACT

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. This article suggests a new statistic that can be used for detecting the examinees who may have benefited from item preknowledge using their response times. The statistic quantifies the difference in speed between the compromised items and the non-compromised items of the examinees. The distribution of the statistic under the null hypothesis of no preknowledge is proved to be the standard normal distribution. A simulation study is used to evaluate the Type I error rate and power of the suggested statistic. A real data example demonstrates the usefulness of the new statistic that is found to provide information that is not provided by statistics based only on item scores.

12.

The use of item scores and response times to detect examinees who may have benefited from item preknowledge.

Sinharay, Sandip; Johnson, Matthew S.

Br J Math Stat Psychol ; 73(3): 397-419, 2020 11.

Article in English | MEDLINE | ID: mdl-31418458

ABSTRACT

According to Wollack and Schoenig (2018, The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage, 260), benefiting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect examinees who may have benefited from item preknowledge for the case when the set of compromised items is known. The asymptotic distribution of the new statistic under no preknowledge is proved to be a simple mixture of two χ2 distributions. We perform a detailed simulation study to show that the Type I error rate of the new statistic is very close to the nominal level and that the power of the new statistic is satisfactory in comparison to that of the existing statistics for detecting item preknowledge based on both item scores and response times. We also include a real data example to demonstrate the usefulness of the suggested statistic.

Subject(s)

Educational Measurement/statistics & numerical data , Models, Statistical , Algorithms , Computer Simulation , Fraud/statistics & numerical data , Humans , Likelihood Functions , Reaction Time

13.

Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items.

Sinharay, Sandip; Jensen, Jens Ledet.

Psychometrika ; 84(2): 484-510, 2019 06.

Article in English | MEDLINE | ID: mdl-29951971

ABSTRACT

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.

Subject(s)

Deception , Educational Measurement , Models, Statistical , Humans , Psychometrics

14.

Extension of caution indices to mixed-format tests.

Sinharay, Sandip.

Br J Math Stat Psychol ; 71(2): 363-386, 2018 05.

Article in English | MEDLINE | ID: mdl-29315495

ABSTRACT

Tatsuoka suggested several extended caution indices and their standardized versions, and these have been used as person-fit statistics by various researchers. However, these indices are only defined for tests with dichotomous items. This paper extends two of the popular standardized extended caution indices for use with polytomous items and mixed-format tests. Two additional new person-fit statistics are obtained by applying the asymptotic standardization of person-fit statistics for mixed-format tests. Detailed simulations are then performed to compute the Type I error rate and power of the four new person-fit statistics. Two real data illustrations follow. The new person-fit statistics appear to be satisfactory tools for assessing person fit for polytomous items and mixed-format tests.

Subject(s)

Psychometrics/methods , Computer Simulation , Data Interpretation, Statistical , Humans , Models, Statistical , Probability , Psychometrics/statistics & numerical data , Reproducibility of Results , Statistics as Topic

15.

Bayes Factor Covariance Testing in Item Response Models.

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip.

Psychometrika ; 82(4): 979-1006, 2017 12.

Article in English | MEDLINE | ID: mdl-28852944

ABSTRACT

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.

Subject(s)

Bayes Theorem , Models, Statistical , Multivariate Analysis , Algorithms , Computer Simulation , Educational Measurement/methods , Humans

16.

On the Equivalence of a Likelihood Ratio of Drasgow, Levine, and Zickar (1996) and the Statistic Based on the Neyman-Pearson Lemma of Belov (2016).

Sinharay, Sandip.

Appl Psychol Meas ; 41(2): 145-149, 2017 Mar.

Article in English | MEDLINE | ID: mdl-29881083

ABSTRACT

Levine and Drasgow (1988) suggested an approach based on the Neyman-Pearson lemma to detect examinees whose response patterns are "aberrant" due to cheating, language issues, and so on. Belov (2016) used the approach of Levine and Drasgow (1988) to suggest a statistic based on the Neyman-Pearson Lemma (SBNPL) to detect item preknowledge when the investigator knows which items are compromised. This brief report proves that the SBNPL of Belov (2016) is equivalent to a statistic suggested for the same purpose by Drasgow, Levine, and Zickar 20 years ago.

17.

Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

Sinharay, Sandip.

Appl Psychol Meas ; 41(6): 403-421, 2017 Sep.

Article in English | MEDLINE | ID: mdl-29881099

ABSTRACT

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

18.

Three New Methods for Analysis of Answer Changes.

Sinharay, Sandip; Johnson, Matthew S.

Educ Psychol Meas ; 77(1): 54-81, 2017 Jan.

Article in English | MEDLINE | ID: mdl-29795903

ABSTRACT

In a pioneering research article, Wollack and colleagues suggested the "erasure detection index" (EDI) to detect test tampering. The EDI can be used with or without a continuity correction and is assumed to follow the standard normal distribution under the null hypothesis of no test tampering. When used without a continuity correction, the EDI often has inflated Type I error rates. When used with a continuity correction, the EDI has satisfactory Type I error rates, but smaller power compared with the EDI without a continuity correction. This article suggests three methods for detecting test tampering that do not rely on the assumption of a standard normal distribution under the null hypothesis. It is demonstrated in a detailed simulation study that the performance of each suggested method is slightly better than that of the EDI. The EDI and the suggested methods were applied to a real data set. The suggested methods, although more computation intensive than the EDI, seem to be promising in detecting test tampering.

19.

Some Remarks on Applications of Tests for Detecting A Change Point to Psychometric Problems.

Sinharay, Sandip.

Psychometrika ; 82(4): 1149-1161, 2017 12.

Article in English | MEDLINE | ID: mdl-27770307

ABSTRACT

Tests for a change point (e.g., Chen and Gupta, Parametric statistical change point analysis (2nd ed.). Birkhuser, Boston, 2012; Hawkins et al., J Qual Technol 35:355-366, 2003) have recently been brought into the spotlight for their potential uses in psychometrics. They have been successfully applied to detect an unusual change in the mean score of a sequence of administrations of an international language assessment (Lee and von Davier, Psychometrika 78:557-575, 2013) and to detect speededness of examinees (Shao et al., Psychometrika, 2015). The differences in the type of data used, the test statistics, and the manner in which the critical values were obtained in these papers lead to questions such as "what type of psychometric problems can be solved by tests for a change point?" and "what test statistics should be used with tests for a change point in psychometric problems?" This note attempts to answer some of these questions by providing a general overview of tests for a change point with a focus on application to psychometric problems. A discussion is provided on the choice of an appropriate test statistic and on the computation of a corresponding critical value for tests for a change point. Then, three real data examples are provided to demonstrate how tests for a change point can be used to make important inferences in psychometric problems. The examples include some clarifications and remarks on the critical values used in Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015). The overview and the examples provide insight on tests for a change point above and beyond Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015). Thus, this note extends the research of Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015) on tests for a change point.

Subject(s)

Data Interpretation, Statistical , Psychometrics/methods , Educational Measurement , Humans

20.

The choice of the ability estimate with asymptotically correct standardized person-fit statistics.

Sinharay, Sandip.

Br J Math Stat Psychol ; 69(2): 175-93, 2016 May.

Article in English | MEDLINE | ID: mdl-27062601

ABSTRACT

Snijders (2001, Psychometrika, 66, 331) suggested a statistical adjustment to obtain the asymptotically correct standardized versions of a specific class of person-fit statistics. His adjustment has been used to obtain the asymptotically correct standardized versions of several person-fit statistics including the lz statistic (Drasgow et al., 1985, Br. J. Math. Stat. Psychol., 38, 67), the infit and outfit statistics (e.g., Wright & Masters, 1982, Rating scale analysis, Chicago, IL: Mesa Press), and the standardized extended caution indices (Tatsuoka, 1984, Psychometrika, 49, 95). Snijders (2001), van Krimpen-Stoop and Meijer (1999, Appl. Psychol. Meas., 23, 327), Magis et al. (2012, J. Educ. Behav. Stat., 37, 57), Magis et al. (2014, J. Appl. Meas., 15, 82), and Sinharay (2015b, Psychometrika, doi:10.1007/s11336-015-9465-x, 2016b, Corrections of standardized extended caution indices, Unpublished manuscript) have used the maximum likelihood estimate, the weighted likelihood estimate, and the posterior mode of the examinee ability with the adjustment of Snijders (2001). This paper broadens the applicability of the adjustment of Snijders (2001) by showing how other ability estimates such as the expected a posteriori estimate, the biweight estimate (Mislevy & Bock, 1982, Educ. Psychol. Meas., 42, 725), and the Huber estimate (Schuster & Yuan, 2011, J. Educ. Behav. Stat., 36, 720) can be used with the adjustment. A simulation study is performed to examine the Type I error rate and power of two asymptotically correct standardized person-fit statistics with several ability estimates. A real data illustration follows.

Subject(s)

Biostatistics/methods , Data Interpretation, Statistical , Educational Measurement/methods , Models, Statistical , Computer Simulation , Humans , Reference Values , Reproducibility of Results , Sensitivity and Specificity

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL