Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Educ Psychol Meas ; 82(4): 719-746, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35754616

ABSTRACT

Proposed is a new method of standard setting referred to as response vector for mastery (RVM) method. Under the RVM method, the task of panelists that participate in the standard setting process does not involve conceptualization of a borderline examinee and probability judgments as it is the case with the Angoff and bookmark methods. Also, the RVM-based computation of a cut-score is not based on a single item (e.g., marked in an ordered item booklet) but, instead, on a response vector (1/0 scores) on items and their parameters calibrated in item response theory or under the recently developed D-scoring method. Illustrations with hypothetical and real-data scenarios of standard setting are provided and methodological aspects of the RVM method are discussed.

2.
Educ Psychol Meas ; 82(1): 107-121, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34992308

ABSTRACT

This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as D-scoring method (DSM). Under the proposed approach, called P-Z method of testing for DIF, the item response functions of two groups (reference and focal) are compared by transforming their probabilities of correct item response, estimated under the DSM, into Z-scale normal deviates. Using the liner relationship between such Z-deviates, the testing for DIF is reduced to testing two basic statistical hypotheses about equal variances and equal means of the Z-deviates for the reference and focal groups. The results from a simulation study support the efficiency (low Type error and high power) of the proposed P-Z method. Furthermore, it is shown that the P-Z method is directly applicable in testing for differential test functioning. Recommendations for practical use and future research, including possible applications of the P-Z method in IRT context, are also provided.

3.
Educ Psychol Meas ; 81(2): 388-404, 2021 Apr.
Article in English | MEDLINE | ID: mdl-37929260

ABSTRACT

This study presents a latent (item response theory-like) framework of a recently developed classical approach to test scoring, equating, and item analysis, referred to as D-scoring method. Specifically, (a) person and item parameters are estimated under an item response function model on the D-scale (from 0 to 1) using marginal maximum-likelihood estimation and (b) analytic expressions are provided for item information function, test information function, and standard error of estimation for D-scores obtained under the proposed latent treatment of the D-scoring method. The results from a simulation study reveal very good recovery of item and person parameters via the marginal maximum-likelihood estimation method. Discussion and recommendations for practice are provided.

4.
Educ Psychol Meas ; 80(2): 389-398, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32158027

ABSTRACT

A procedure for evaluation of validity related coefficients and their differences is discussed, which is applicable when one or more frequently used assumptions in empirical educational, behavioral and social research are violated. The method is developed within the framework of the latent variable modeling methodology and accomplishes point and interval estimation of convergent and discriminant correlations as well as differences between them in cases of incomplete data sets with data not missing at random, nonnormality, and clustering effects. The procedure uses the full information maximum likelihood approach to model fitting and parameter estimation, does not assume availability of multiple indicators for underlying latent constructs, includes auxiliary variables, and accounts for within-group correlations on main response variables resulting from nesting effects involving studied respondents. The outlined procedure is illustrated on empirical data from a study using tertiary education entrance examination measures.

5.
Educ Psychol Meas ; 80(1): 126-144, 2020 Feb.
Article in English | MEDLINE | ID: mdl-31933495

ABSTRACT

This study presents new models for item response functions (IRFs) in the framework of the D-scoring method (DSM) that is gaining attention in the field of educational and psychological measurement and largescale assessments. In a previous work on DSM, the IRFs of binary items were estimated using a logistic regression model (LRM). However, the LRM underestimates the item true scores at the top end of the D-scale (ranging from 0 to 1), especially for relatively difficult items. This entails underestimation of true D-scores, inaccuracy in the estimates of their standard errors, and other psychometric issues. The inverse-regression adjustments used to fix this problem are too complicated for regular applications of the DSM and not in line with its simplicity. This issue is resolved with the IRF models proposed in this study, referred to as rational function models (RFMs) with one parameter (RFM1), two parameters (RFM2), and three parameters (RFM3). The proposed RFMs are discussed and illustrated with simulated and real data.

6.
Educ Psychol Meas ; 79(6): 1198-1209, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31619845

ABSTRACT

This note highlights and illustrates the links between item response theory and classical test theory in the context of polytomous items. An item response modeling procedure is discussed that can be used for point and interval estimation of the individual true score on any item in a measuring instrument or item set following the popular and widely applicable graded response model. The method contributes to the body of research on the relationships between classical test theory and item response theory and is illustrated on empirical data.

7.
Educ Psychol Meas ; 79(5): 988-1008, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31488922

ABSTRACT

The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method.

8.
Educ Psychol Meas ; 79(3): 545-557, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31105322

ABSTRACT

An approach to scoring tests with binary items, referred to as D-scoring method, was previously developed as a classical analog to basic models in item response theory (IRT) for binary items. As some tests include polytomous items, this study offers an approach to D-scoring of such items and parallels the results with those obtained under the graded response model (GRM) for ordered polytomous items in the framework of IRT. The proposed design of using D-scoring with "virtual" binary items generated from polytomous items provides (a) ability scores that are consistent with their GRM counterparts and (b) item category response functions analogous to those obtained under the GRM. This approach provides a unified framework for D-scoring and psychometric analysis of tests with binary and/or polytomous items that can be efficient in different scenarios of educational and psychological assessment.

9.
Educ Psychol Meas ; 79(2): 272-287, 2019 Apr.
Article in English | MEDLINE | ID: mdl-30911193

ABSTRACT

Plausible values can be used to either estimate population-level statistics or compute point estimates of latent variables. While it is well known that five plausible values are usually sufficient for accurate estimation of population-level statistics in large-scale surveys, the minimum number of plausible values needed to obtain accurate latent variable point estimates is unclear. This is especially relevant when an item response theory (IRT) model is estimated with MCMC (Markov chain Monte Carlo) methods in Mplus and point estimates of the IRT ability parameter are of interest, as Mplus only estimates the posterior distribution of each ability parameter. In order to obtain point estimates of the ability parameter, a number of plausible values can be drawn from the posterior distribution of each individual ability parameter and their mean (the posterior mean ability estimate) can be used as an individual ability point estimate. In this note, we conducted a simulation study to investigate how many plausible values were needed to obtain accurate posterior mean ability estimates. The results indicate that 20 is the minimum number of plausible values required to obtain point estimates of the IRT ability parameter that are comparable to marginal maximum likelihood estimation(MMLE)/expected a posteriori (EAP) estimates. A real dataset was used to demonstrate the comparison between MMLE/EAP point estimates and posterior mean ability estimates based on different number of plausible values.

10.
Educ Psychol Meas ; 79(4): 796-807, 2019 Aug.
Article in English | MEDLINE | ID: mdl-32655184

ABSTRACT

Building on prior research on the relationships between key concepts in item response theory and classical test theory, this note contributes to highlighting their important and useful links. A readily and widely applicable latent variable modeling procedure is discussed that can be used for point and interval estimation of the individual person true score on any item in a unidimensional multicomponent measuring instrument or item set under consideration. The method adds to the body of research on the connections between classical test theory and item response theory. The outlined estimation approach is illustrated on empirical data.

11.
Educ Psychol Meas ; 78(1): 167-174, 2018 Feb.
Article in English | MEDLINE | ID: mdl-29795951

ABSTRACT

This article extends the procedure outlined in the article by Raykov, Marcoulides, and Tong for testing congruence of latent constructs to the setting of binary items and clustering effects. In this widely used setting in contemporary educational and psychological research, the method can be used to examine if two or more homogeneous multicomponent instruments with distinct components measure the same construct. The approach is useful in scale construction and development research as well as in construct validation investigations. The discussed method is illustrated with data from a scholastic aptitude assessment study.

12.
Educ Psychol Meas ; 78(2): 343-352, 2018 Apr.
Article in English | MEDLINE | ID: mdl-29795959

ABSTRACT

A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and permits one to locate violations of measurement invariance in loading or threshold parameters. The discussed method does not require selection of a reference observed variable and is directly applicable for studying differential item functioning with one- or two-parameter item response models. The extended procedure is illustrated on an empirical data set.

13.
Educ Psychol Meas ; 78(5): 805-825, 2018 Oct.
Article in English | MEDLINE | ID: mdl-32655171

ABSTRACT

This article presents some new developments in the methodology of an approach to scoring and equating of tests with binary items, referred to as delta scoring (D-scoring), which is under piloting with large-scale assessments at the National Center for Assessment in Saudi Arabia. This presentation builds on a previous work on delta scoring and adds procedures for scaling and equating, item response function, and estimation of true values and standard errors of D scores. Also, unlike the previous work on this topic, where D-scoring involves estimates of item and person parameters in the framework of item response theory, the approach presented here does not require item response theory calibration.

14.
ACS Biomater Sci Eng ; 4(12): 4412-4424, 2018 Dec 10.
Article in English | MEDLINE | ID: mdl-33418834

ABSTRACT

Bacteria colonizing the surface of orthopedic implants are responsible for most postoperative periprosthetic joint infections. A possible alternative route for drug delivery is described in this study by utilizing the bulk of the implant itself as a reservoir. Drug release is enabled by manufacturing of integrated permeable structures possessing high porosity through application of selective laser melting technology. The concept was evaluated in two paths, with 400 µm permeable thin walls and with dense reservoirs containing an integrated 950 µm permeable wall. Components were designed and preprocessed as separate parts, allowing for allocation of different settings of laser power and scanning speed. Lowering the energy input into the selective laser melting process to induce intermittent melting of the Ti6Al4V ELI powder produced porous components through which vancomycin was released with differing profiles. Static water contact angle measurements demonstrated a significant effect on the hydrophilicity by permeable wall thickness. Relative porosities of the 400 µm structures were determined with microcomputed tomography analyses. A transition zone of 21.17% porosity was identified where release profiles change from porosity-dependent to near free diffusion. Antimicrobial activity of released vancomycin was confirmed through evaluation against Staphylococcus aureus Xen 36 in two separate agar diffusion assays. The approach is promising for incorporation into the design and manufacturing of next-generation prosthetic implants with controlled release of antibiotics in situ and the subsequent prevention of periprosthetic joint infections.

15.
Educ Psychol Meas ; 76(6): 954-975, 2016 Dec.
Article in English | MEDLINE | ID: mdl-29795895

ABSTRACT

This article describes an approach to test scoring, referred to as delta scoring (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the examinee's response vector, which is weighted by the expected difficulties (not "easiness") of the test items. The expected difficulty of each item is obtained as an analytic function of its IRT parameters. The D-scores are independent of the sample of test-takers as they are based on expected item difficulties. It is shown that the D-scale performs a good bit better than the IRT logit scale by criteria of scale intervalness. To equate D-scales, it is sufficient to rescale the item parameters, thus avoiding tedious and error-prone procedures of mapping test characteristic curves under the method of IRT true score equating, which is often used in the practice of large-scale testing. The proposed D-scaling proved promising under its current piloting with large-scale assessments and the hope is that it can efficiently complement IRT procedures in the practice of large-scale testing in the field of education and psychology.

16.
Educ Psychol Meas ; 75(3): 475-490, 2015 Jun.
Article in English | MEDLINE | ID: mdl-29795829

ABSTRACT

This article is concerned with developing a measure of general academic ability (GAA) for high school graduates who apply to colleges, as well as with the identification of optimal weights of the GAA indicators in a linear combination that yields a composite score with maximal reliability and maximal predictive validity, employing the framework of the popular latent variable modeling methodology. The approach to achieving this goal is illustrated with data for 6,640 students with major in Science and 3,388 students with major in Art from colleges in Saudi Arabia. The indicators (observed measures) of the targeted GAA construct were selected from assessments that include the students' high school grade and their scores on two standardized tests developed by the National Center for Assessment in Higher Education in Saudi Arabia, General Aptitude Test (GAT) and Standardized Achievement Admission Test (SAAT). A unidimensional measure of GAA was developed initially, with different sets of indicators for colleges with major in Science and for colleges with major in Art. Appropriate indicators for colleges with major in Science were the high school grade, total score on GAT, and four SAAT subscales on Biology, Chemistry, Physics, and Math. With respect to colleges with major in Art, appropriate GAA indicators were the students' high school grade and their scores on GAT-Verbal, GAT-Quantitative, and SAAT. Although the case study is Saudi Arabia, the methods and procedures discussed in this article have broader utility and can be used in different contexts of educational and psychological assessment.

17.
Dev Psychol ; 43(1): 173-85, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17201517

ABSTRACT

The purpose of this study was to examine the effects of program interventions in a school-based teen pregnancy program on hypothesized constructs underlying teens' attitudes toward sexuality. An important task related to this purpose was the validation of the constructs and their stability from pre- to postintervention measures. Data from 1,136 middle grade students were obtained from an earlier evaluation of an abstinence-based teen pregnancy prevention program (S. Weed, I. Ericksen, G. Grant, & A. Lewis, 2002). Latent trait structural equation modeling was used to evaluate the impact of the intervention program on changes in constructs of teens' attitudes toward sexuality. Gender was also taken into consideration. This investigation provides credible evidence that both 1st- and 2nd-order constructs related to measures of teens' attitudes toward risky sexual behavior are sufficiently stable and sensitive to detect program effects.


Subject(s)
Health Knowledge, Attitudes, Practice , Pregnancy in Adolescence/prevention & control , Sex Education , Sexual Behavior , Adolescent , Curriculum , Female , Humans , Peer Group , Pregnancy , Program Evaluation , Safe Sex , Self Efficacy , Sex Factors , Sexual Abstinence
18.
Work ; 26(4): 429-36, 2006.
Article in English | MEDLINE | ID: mdl-16788262

ABSTRACT

Structural equation modeling (SEM) provides a dependable framework for testing differences among groups on latent variables (constructs, factors). The purpose of this article is to illustrate SEM-based testing for group mean differences on latent variables. Related procedures of confirmatory factor analysis and testing for measurement invariance across compared groups are also presented in the context of rehabilitation research.


Subject(s)
Analysis of Variance , Models, Statistical , Population Groups , Factor Analysis, Statistical , Humans , Multiple Sclerosis/psychology , Rehabilitation, Vocational , United States
19.
J Appl Meas ; 7(2): 170-83, 2006.
Article in English | MEDLINE | ID: mdl-16632900

ABSTRACT

Two frequently used parametric statistics of person-fit with the dichotomous Rasch model (RM) are adjusted and compared to each other and to their original counterparts in terms of power to detect aberrant response patterns in short tests (10, 20, and 30 items). Specifically, the cube root transformation of the mean square for the unweighted person-fit statistic, t, and the standardized likelihood-based person-fit statistic Z3 were adjusted by estimating the probability for correct item response through the use of symmetric functions in the dichotomous Rasch model. The results for simulated unidimensional Rasch data indicate that t and Z3 are consistently, yet not greatly, outperformed by their adjusted counterparts, denoted t* and Z3*, respectively. The four parametric statistics, t, Z3, t*, and Z3*, were also compared to a non-parametric statistic, HT, identified in recent research as outperforming numerous parametric and non-parametric person-fit statistics. The results show that HT substantially outperforms t, Z3, t*, and Z3* in detecting aberrant response patterns for 20-item and 30-item tests, but not for very short tests of 10 items. The detection power of t, Z3, t*, and Z3*, and HT at two specific levels of Type I error, .10 and .05 (i.e., up to 10% and 5% false alarm rate, respectively), is also reported.


Subject(s)
Psychological Tests/statistics & numerical data , Humans , United States
20.
J Appl Meas ; 4(3): 222-33, 2003.
Article in English | MEDLINE | ID: mdl-12904673

ABSTRACT

This article provides formulas for expected true-score measures and reliability of binary items as a function of their Rasch difficulty when the trait (ability) distribution is normal or logistic. The proposed formulas have theoretical value and can be useful in test development, score analysis, and simulation studies. Once the items are calibrated with the dichotomous Rasch model, one can estimate (without further data collection) the expected values for true-score measures (e.g., domain score, true score variance, and error variance for the number-right score) and reliability for both norm-referenced and criterion-referenced interpretations. Thus, given a bank of Rasch calibrated items, one can develop a test with desirable values of population true-score measures and reliability or compare such measures for subsets of items that are grouped by substantive characteristics (e.g., content areas or strands of learning outcomes). An illustrative example for using the proposed formulas is also provided.


Subject(s)
Models, Statistical , Psychometrics/methods , Humans , Probability , Psychometrics/statistics & numerical data , Reproducibility of Results , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...