Search | VHL Regional Portal

1.

Social and emotional competency development from fourth to 12th grade: Relations to parental education and gender.

Rimm-Kaufman, Sara E; Soland, James; Kuhfeld, Megan.

Am Psychol ; 2024 May 23.

Article in English | MEDLINE | ID: mdl-38780577

ABSTRACT

Educators have become increasingly committed to social and emotional learning in schools. However, we know too little about the typical growth trajectories of the competencies that schools are striving to improve. We leverage data from the California Office to Reform Education, a consortium of districts in California serving over 1.5 million students, that administers annual surveys to students to measure social and emotional competencies (SECs). This article uses data from six cohorts of approximately 16,000 students each (51% male, 73% Latinx, 11% White, 10% Black, 24% with parents who did not complete high school) in Grades 4-12. Two questions are addressed. First, how much growth occurs in growth mindset, self-efficacy, self-management, and social awareness from Grades 4 to 12? Second, do initial status and growth look different by parental educational attainment and gender? Using accelerated longitudinal design growth models, findings show distinct growth trends among the four SECs with growth mindset increasing, self-management mostly decreasing, and self-efficacy and social awareness decreasing and then increasing. The subgroup analyses show gaps between groups but patterns of growth that are more similar than different. Further, subgroup membership accounts for very little variation in growth or declines. Instead, initial levels of competencies predict growth. Also, variation within groups is greater than variation between groups. The findings have practical implications for educators and psychologists striving to improve SECs. If schools use student-report approaches, predicting steady and consistent positive growth in SECs is unrealistic. Instead, U-shaped patterns for some SECs appear to be normative with notable declines in the sixth grade, requiring new supports. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

2.

How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects.

Edwards, Kelly D; Soland, James.

Appl Psychol Meas ; 48(3): 147-164, 2024 May.

Article in English | MEDLINE | ID: mdl-38585305

ABSTRACT

Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all "easy"-that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.

3.

Scoring assessments in multisite randomized control trials: Examining the sensitivity of treatment effect estimates to measurement choices.

Kuhfeld, Megan; Soland, James.

Psychol Methods ; 2023 Dec 21.

Article in English | MEDLINE | ID: mdl-38127570

ABSTRACT

While a great deal of thought, planning, and money goes into the design of multisite randomized control trials (RCTs) that are used to evaluate the effectiveness of interventions in fields like education and psychology, relatively little thought is often paid to the measurement choices made in such evaluations. In this study, we conduct a series of simulation studies that consider a wide range of options for producing scores from multiple administration of assessments in the context of multisite RCTs. The scoring models considered range from the simple (sum scores) to highly complex (multilevel two-tier item response theory [IRT] models with latent regression). We find that the true treatment effect is attenuated when sum scores or scores from IRT models that do not account for treatment assignment are used. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

4.

Regression discontinuity designs in a latent variable framework.

Soland, James; Johnson, Angela; Talbert, Eli.

Psychol Methods ; 28(3): 691-704, 2023 Jun.

Article in English | MEDLINE | ID: mdl-35588080

ABSTRACT

When randomized control trials are not available, regression discontinuity (RD) designs are a viable quasi-experimental method shown to be capable of producing causal estimates of how a program or intervention affects an outcome. While the RD design and many related methodological innovations came from the field of psychology, RDs are underutilized among psychologists even though many interventions are assigned on the basis of scores from common psychological measures, a situation tailor-made for RDs. In this tutorial, we present a straightforward way to implement an RD model as a structural equation model (SEM). By using SEM, we both situate RDs within a method commonly used in psychology, as well as show how RDs can be implemented in a way that allows one to account for measurement error and avoid measurement model misspecification, both of which often affect psychological measures. We begin with brief Monte Carlo simulation studies to examine the potential benefits of using a latent variable RD model, then transition to an applied example, replete with code and results. The aim of the study is to introduce RD to a broader audience in psychology, as well as show researchers already familiar with RD how employing an SEM framework can be beneficial. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Subject(s)

Models, Statistical , Research Design , Humans , Causality , Monte Carlo Method

5.

Investigating differences in how parents and teachers rate students' self-control.

Kuhfeld, Megan; Soland, James; Lewis, Karyn.

Psychol Assess ; 35(1): 23-31, 2023 Jan.

Article in English | MEDLINE | ID: mdl-36355691

ABSTRACT

Supporting students' social-emotional learning (SEL) is gaining emphasis in education. In particular, self-control is a construct that has been shown to predict academic outcomes, though much debate on this point exists. Although largely unexamined, inconsistent findings could stem from the fact that related surveys are often scored by multiple raters (e.g., teachers and parents), especially when administered at a young age when students cannot respond to items themselves. Yet little is known about (a) how much parent and teacher self-control ratings overlap and (b) what student characteristics like race and socioeconomic status are associated with inconsistencies. In this study, we use data from a widely used measure of early self-control with parent and teacher forms. We use these data to examine the impact of rater discrepancies on our understanding of students' self-control. Results show relatively low agreement between parents and teachers, with some evidence that discrepancies are associated with student race. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Subject(s)

Self-Control , Students , Humans , Students/psychology , Surveys and Questionnaires , Parents/psychology , School Teachers/psychology , Schools

6.

How survey scoring decisions can influence your study's results: A trip through the IRT looking glass.

Soland, James; Kuhfeld, Megan; Edwards, Kelly.

Psychol Methods ; 2022 Jul 14.

Article in English | MEDLINE | ID: mdl-35834195

ABSTRACT

Though much effort is often put into designing psychological studies, the measurement model and scoring approach employed are often an afterthought, especially when short survey scales are used (Flake & Fried, 2020). One possible reason that measurement gets downplayed is that there is generally little understanding of how calibration/scoring approaches could impact common estimands of interest, including treatment effect estimates, beyond random noise due to measurement error. Another possible reason is that the process of scoring is complicated, involving selecting a suitable measurement model, calibrating its parameters, then deciding how to generate a score, all steps that occur before the score is even used to examine the desired psychological phenomenon. In this study, we provide three motivating examples where surveys are used to understand individuals' underlying social emotional and/or personality constructs to demonstrate the potential consequences of measurement/scoring decisions. These examples also mean we can walk through the different measurement decision stages and, hopefully, begin to demystify them. As we show in our analyses, the decisions researchers make about how to calibrate and score the survey used has consequences that are often overlooked, with likely implications both for conclusions drawn from individual psychological studies and replications of studies. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

7.

Evidence That Selecting an Appropriate Item Response Theory-Based Approach to Scoring Surveys Can Help Avoid Biased Treatment Effect Estimates.

Soland, James.

Educ Psychol Meas ; 82(2): 376-403, 2022 Apr.

Article in English | MEDLINE | ID: mdl-35185164

ABSTRACT

Considerable thought is often put into designing randomized control trials (RCTs). From power analyses and complex sampling designs implemented preintervention to nuanced quasi-experimental models used to estimate treatment effects postintervention, RCT design can be quite complicated. Yet when psychological constructs measured using survey scales are the outcome of interest, measurement is often an afterthought, even in RCTs. The purpose of this study is to examine how choices about scoring and calibration of survey item responses affect recovery of true treatment effects. Specifically, simulation and empirical studies are used to compare the performance of sum scores, which are frequently used in RCTs in psychology and education, to that of approaches rooted in item response theory (IRT) that better account for the longitudinal, multigroup nature of the data. The results from this study indicate that selecting an IRT model that matches the nature of the data can significantly reduce bias in treatment effect estimates and reduce standard errors.

8.

Empirical benchmarks for changes in social and emotional skills over time.

Soland, James; Rimm-Kaufman, Sara E; Kuhfeld, Megan; Ventura-Abbas, Nadia.

Child Dev ; 93(4): 1129-1144, 2022 07.

Article in English | MEDLINE | ID: mdl-35195286

ABSTRACT

This study provides empirical benchmarks that quantify typical changes in students' reports of social and emotional skills in a large, diverse sample. Data come from six cohorts of students (N = 361,815; 6% Asian, 8% Black, 68% White, 75% Latinx, 50% Female) who responded to the CORE survey from 2015 to 2018 and help quantify typical gains/declines in growth mindset, self-efficacy, self-management, and social awareness. Results show fluctuations in skills between 4th and 12th grade (changes ranging from -.33 to .23 standard deviations). Growth mindset increases in fourth grade, declines in fifth to seventh grade, then mostly increases. Self-efficacy, self-management, and social awareness decline in sixth to eighth grade. Self-management and social awareness, but not self-efficacy, show increases in 10th to 12th grade.

Subject(s)

Benchmarking , Students , Emotions , Female , Humans , Male , Self Efficacy , Social Skills , Students/psychology , Surveys and Questionnaires

9.

Avoiding bias from sum scores in growth estimates: An examination of IRT-based approaches to scoring longitudinal survey responses.

Kuhfeld, Megan; Soland, James.

Psychol Methods ; 27(2): 234-260, 2022 Apr.

Article in English | MEDLINE | ID: mdl-33090818

ABSTRACT

A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. Although there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs. Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Subject(s)

Research Design , Bias , Computer Simulation , Humans , Longitudinal Studies , Surveys and Questionnaires

10.

A Multi-Rater Latent Growth Curve Model.

Soland, James; Kuhfeld, Megan.

Multivariate Behav Res ; 57(5): 701-717, 2022.

Article in English | MEDLINE | ID: mdl-33982606

ABSTRACT

To avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student's self-efficacy in school), multiple raters are often used. Increasingly, data that use multiple raters to evaluate psychological and social-emotional constructs over time are available. While a range of models to address measurement issues that arise when using multiple raters have been presented, including a small number for longitudinal data, few if any models are available to estimate growth in the presence of multiple raters. In this study, we provide a model that removes all but the shared perceptions of raters at a given timepoint (i.e., removes unique rater variance), then adds on a latent growth curve model across timepoints. Through simulation and empirical studies, we examine the performance of the model in terms of recovering true growth parameters, and relative to more crude approaches like estimating growth based on a single rater. Our results indicate that the model we propose performs quite well along these dimensions, and shows promise for use by researchers who want to estimate growth based on longitudinal multi-rater data.

Subject(s)

Computer Simulation , Humans

11.

Identifying students who are off-track academically at the start of secondary school: The role of social-emotional learning trajectories.

Soland, James; Kuhfeld, Megan.

Br J Educ Psychol ; 92(2): e12463, 2022 Jun.

Article in English | MEDLINE | ID: mdl-34713891

ABSTRACT

BACKGROUND: Research shows that successfully transitioning from intermediate school to secondary school is pivotal for students to remain on track to graduate. Studies also indicate that a successful transition is a function not only of how prepared the students are academically but also whether they have the social-emotional learning (SEL) skills to succeed in a more independent secondary school environment. AIM: Yet, little is known about whether students' SEL skills are stable over time, and if they are not, whether a student's initial level of SEL skills at the start of intermediate school or change in SEL skills over time is a better indicator of whether the student will be off track academically in 9th grade. This study begins to investigate this issue. SAMPLE: We use four years of longitudinal SEL data from students in a large urban district with a sample size of ~3,000 students per timepoint. METHODS: We use several years of longitudinal SEL data to fit growth models for three constructs shown to be related to successfully transitioning to secondary school. In so doing, we examine whether a student's mean SEL score in 6th grade (status) or growth between 6th and 8th grade is more predictive of being off track academically in 9th grade. RESULT: Results indicate that, while status is more frequently significant, growth for self-management is also predictive above and beyond status on that construct. CONCLUSION: Findings suggest that understanding how a student develops social-emotionally can improve identification of students not on track to succeed in high school.

Subject(s)

Schools , Social Learning , Emotions , Humans , Social Skills , Students/psychology

12.

Examining the Performance of the Trifactor Model for Multiple Raters.

Soland, James; Kuhfeld, Megan.

Appl Psychol Meas ; 46(1): 53-67, 2022 Jan.

Article in English | MEDLINE | ID: mdl-34898747

ABSTRACT

Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person's responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the "trifactor model" to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.

13.

Is Measurement Noninvariance a Threat to Inferences Drawn from Randomized Control Trials? Evidence From Empirical and Simulation Studies.

Soland, James.

Appl Psychol Meas ; 45(5): 346-360, 2021 Jul.

Article in English | MEDLINE | ID: mdl-34565940

ABSTRACT

Randomized control trials (RCTs) are considered the gold standard when evaluating the impact of psychological interventions, educational programs, and other treatments on outcomes of interest. However, few studies consider whether forms of measurement bias like noninvariance might impact estimated treatment effects from RCTs. Such bias may be more likely to occur when survey scales are utilized in studies and evaluations in ways not supported by validation evidence, which occurs in practice. This study consists of simulation and empirical studies examining whether measurement noninvariance impacts treatment effects from RCTs. Simulation study results demonstrate that bias in treatment effect estimates is mild when the noninvariance occurs between subgroups (e.g., male and female participants), but can be quite substantial when being assigned to control or treatment induces the noninvariance. Results from the empirical study show that surveys used in two federally funded evaluations of educational programs were noninvariant across student age groups.

14.

Investigating the Impact of Noneffortful Responses on Individual-Level Scores: Can the Effort-Moderated IRT Model Serve as a Solution?

Rios, Joseph A; Soland, James.

Appl Psychol Meas ; 45(6): 391-406, 2021 Sep.

Article in English | MEDLINE | ID: mdl-34565943

ABSTRACT

Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g., identifying students for remediation). Focusing on the latter context, this study addressed two related questions via simulation and applied analyses. First, we investigated how much including noneffortful responses in scoring using a three-parameter logistic (3PL) model affects person parameter recovery and classification accuracy for noneffortful responders. Second, we explored whether improvements in these individual-level inferences were observed when employing the Effort Moderated IRT (EM-IRT) model under conditions in which its assumptions were met and violated. Results demonstrated that including 10% noneffortful responses in scoring led to average bias in ability estimates and misclassification rates by as much as 0.15 SDs and 7%, respectively. These results were mitigated when employing the EM-IRT model, particularly when model assumptions were met. However, once model assumptions were violated, the EM-IRT model's performance deteriorated, though still outperforming the 3PL model. Thus, findings from this study show that (a) including noneffortful responses when using individual scores can lead to potential unfounded inferences and potential score misuse, and (b) the negative impact that noneffortful responding has on person ability estimates and classification accuracy can be mitigated by employing the EM-IRT model, particularly when its assumptions are met.

15.

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations.

Rios, Joseph A; Soland, James.

Educ Psychol Meas ; 81(3): 569-594, 2021 Jun.

Article in English | MEDLINE | ID: mdl-33994564

ABSTRACT

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model's assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.

16.

Do Response Styles Affect Estimates of Growth on Social-Emotional Constructs? Evidence from Four Years of Longitudinal Survey Scores.

Soland, James; Kuhfeld, Megan.

Multivariate Behav Res ; 56(6): 853-873, 2021.

Article in English | MEDLINE | ID: mdl-32633574

ABSTRACT

Survey respondents employ different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data. While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined. In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results. Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and IRT model used.

Subject(s)

Surveys and Questionnaires , Bias , Computer Simulation , Longitudinal Studies

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL