Search | VHL Regional Portal

1.

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks.

von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale.

Educ Psychol Meas ; 83(3): 556-585, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37187689

ABSTRACT

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.

2.

Erratum to: A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Ulitzsch, Esther; Pohl, Steffi; Khorramdel, Lale; Kroehne, Ulf; von Davier, Matthias.

Psychometrika ; 87(2): 798, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35171401

3.

A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Ulitzsch, Esther; Pohl, Steffi; Khorramdel, Lale; Kroehne, Ulf; von Davier, Matthias.

Psychometrika ; 87(2): 593-619, 2022 06.

Article in English | MEDLINE | ID: mdl-34855118

ABSTRACT

Careless and insufficient effort responding (C/IER) can pose a major threat to data quality and, as such, to validity of inferences drawn from questionnaire data. A rich body of methods aiming at its detection has been developed. Most of these methods can detect only specific types of C/IER patterns. However, typically different types of C/IER patterns occur within one data set and need to be accounted for. We present a model-based approach for detecting manifold manifestations of C/IER at once. This is achieved by leveraging response time (RT) information available from computer-administered questionnaires and integrating theoretical considerations on C/IER with recent psychometric modeling approaches. The approach a) takes the specifics of attentive response behavior on questionnaires into account by incorporating the distance-difficulty hypothesis, b) allows for attentiveness to vary on the screen-by-respondent level, c) allows for respondents with different trait and speed levels to differ in their attentiveness, and d) at once deals with various response patterns arising from C/IER. The approach makes use of item-level RTs. An adapted version for aggregated RTs is presented that supports screening for C/IER behavior on the respondent level. Parameter recovery is investigated in a simulation study. The approach is illustrated in an empirical example, comparing different RT measures and contrasting the proposed model-based procedure against indicator-based multiple-hurdle approaches.

Subject(s)

Psychometrics , Computer Simulation , Psychometrics/methods , Reaction Time , Self Report , Surveys and Questionnaires

4.

Advances in modelling response styles and related phenomena.

Khorramdel, Lale; Jeon, Minjeong; Leigh Wang, Lihshing.

Br J Math Stat Psychol ; 72(3): 393-400, 2019 11.

Article in English | MEDLINE | ID: mdl-31721155

Subject(s)

Bias , Research , Models, Statistical , Psychology, Social , Reproducibility of Results , Research/statistics & numerical data

5.

Combining mixture distribution and multidimensional IRTree models for the measurement of extreme response styles.

Khorramdel, Lale; von Davier, Matthias; Pokropek, Artur.

Br J Math Stat Psychol ; 72(3): 538-559, 2019 11.

Article in English | MEDLINE | ID: mdl-31385610

ABSTRACT

Personality constructs, attitudes and other non-cognitive variables are often measured using rating or Likert-type scales, which does not come without problems. Especially in low-stakes assessments, respondents may produce biased responses due to response styles (RS) that reduce the validity and comparability of the measurement. Detecting and correcting RS is not always straightforward because not all respondents show RS and the ones who do may not do so to the same extent or in the same direction. The present study proposes the combination of a multidimensional IRTree model with a mixture distribution item response theory model and illustrates the application of the approach using data from the Programme for the International Assessment of Adult Competencies (PIAAC). This joint approach allows for the differentiation between different latent classes of respondents who show different RS behaviours and respondents who show RS versus respondents who give (largely) unbiased responses. We illustrate the application of the approach by examining extreme RS and show how the resulting latent classes can be further examined using external variables and process data from computer-based assessments to develop a better understanding of response behaviour and RS.

Subject(s)

Bias , Self Report , Attitude , Humans , Models, Statistical , Personality Assessment , Psychometrics

6.

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports.

von Davier, Matthias; Shin, Hyo-Jeong; Khorramdel, Lale; Stankov, Lazar.

Appl Psychol Meas ; 42(4): 291-306, 2018 Jun.

Article in English | MEDLINE | ID: mdl-29881126

ABSTRACT

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups. Vignettes are used to adjust self-assessment responses on the respondent level but entail significant assumptions: They are supposed to be invariant across respondents, and the responses to vignette prompts are supposed to be without error and strictly ordered. This article shows that these assumptions are not always met and that the anchoring vignette approach leads to higher Cronbach's alpha values and increased correlations among adjusted variables regardless of whether the assumptions of the approach are met or violated. Results suggest that the underlying assumptions and effects of the anchoring vignette approach should be carefully examined as the increased correlations and reliability estimates can be observed even for response variables that are independent random draws and uncorrelated with any other variable.

7.

The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander.

Int J Psychol ; 49(2): 131-9, 2014 Apr.

Article in English | MEDLINE | ID: mdl-24811884

ABSTRACT

An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant.

Subject(s)

Aviation , Deception , Personality , Personnel Selection/methods , Adult , Female , Germany , Humans , Male , Personality Inventory , Surveys and Questionnaires , Volunteers/psychology , Workforce

8.

Measuring Response Styles Across the Big Five: A Multiscale Extension of an Approach Using Multinomial Processing Trees.

Khorramdel, Lale; von Davier, Matthias.

Multivariate Behav Res ; 49(2): 161-77, 2014.

Article in English | MEDLINE | ID: mdl-26741175

ABSTRACT

This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL