Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Behav Res Methods ; 56(3): 2273-2291, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37311866

ABSTRACT

Careless responding, where participants do not fully engage with item content, is pervasive in survey research. Left undetected, carelessness can compromise the interpretation and use of survey results, including information about participant locations on the construct, item difficulty, and the psychometric quality of the instrument. We present and illustrate a sequential procedure for evaluating response quality in survey research using indicators from Mokken scale analysis (MSA). We use a real data illustration and a simulation study to compare a sequential procedure to a standalone procedure. We also consider how identifying and removing responses with evidence of poor measurement properties affects item quality indicators. Results suggest that the sequential procedure was effective in identifying potentially problematic response patterns that may not always be captured by traditional methods for identifying careless responders but was not always sensitive to specific carelessness patterns. We discuss implications for research and practice.


Subject(s)
Research Design , Humans , Surveys and Questionnaires , Psychometrics/methods , Computer Simulation
2.
Appl Psychol Meas ; 47(5-6): 365-385, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37810542

ABSTRACT

Methods to identify carelessness in survey research can be valuable tools in reducing bias during survey development, validation, and use. Because carelessness may take multiple forms, researchers typically use multiple indices when identifying carelessness. In the current study, we extend the literature on careless response identification by examining the usefulness of three item-response theory-based person-fit indices for both random and overconsistent careless response identification: infit MSE outfit MSE, and the polytomous lz statistic. We compared these statistics with traditional careless response indices using both empirical data and simulated data. The empirical data included 2,049 high school student surveys of teaching effectiveness from the Network for Educator Effectiveness. In the simulated data, we manipulated type of carelessness (random response or overconsistency) and percent of carelessness present (0%, 5%, 10%, 20%). Results suggest that infit and outfit MSE and the lz statistic may provide complementary information to traditional indices such as LongString, Mahalanobis Distance, Validity Items, and Completion Time. Receiver operating characteristic curves suggested that the person-fit indices showed good sensitivity and specificity for classifying both over-consistent and under-consistent careless patterns, thus functioning in a bidirectional manner. Carelessness classifications based on low fit values correlated with carelessness classifications from LongString and completion time, and classifications based on high fit values correlated with classifications from Mahalanobis Distance. We consider implications for research and practice.

3.
Appl Psychol Meas ; 47(5-6): 351-364, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37810544

ABSTRACT

Sparse rating designs, where each examinee's performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analytic techniques alert researchers to rater effects in such designs. We used a simulation study to compare the information provided by two popular approaches: Generalizability theory (G theory) and Many-Facet Rasch (MFR) measurement. In previous comparisons, researchers used complete data that were not simulated-thus limiting their ability to manipulate characteristics such as rater effects, and to understand the impact of incomplete data on the results. Both approaches provided information about rating quality in sparse designs, but the MFR approach highlighted rater effects related to centrality and bias more readily than G theory.

4.
Prev Med ; 175: 107708, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37726039

ABSTRACT

Research examining potential differences in physical activity (PA) between sexual minority women (SMW) and heterosexual women have yielded inconsistent results. OBJECTIVE: Therefore, the purpose of this systematic review and meta-analysis is to examine potential differences in PA between SMW and heterosexual women and to identify potential moderators that may partially explain observed differences in PA. METHODS: All studies were peer reviewed, published in English, and included a continuous measure of PA for SMW and heterosexual women. A standardized mean difference effect size (ES) was used to compare groups, with random effects models used to estimate a mean ES and 95% CI using a 3-level meta-analysis model to adjust for the correlation between effects nested within studies. RESULTS: The cumulative results of 24 effects gathered from 7 studies indicated there was no difference in PA between SMW (n = 1619) and heterosexual women (n = 103,295) (ES = -0.038, 95%CI -0.179 to 0.102, p = 0.576). Despite no mean differences, moderate-high heterogeneity was observed, indicating that the results were not consistent across effects (I2 = 64.8%, Q23 = 36.7, p = 0.035). The difference in PA was associated with age (ß = -0.018, 95%CI -0.034 to -0.003, p = 0.022) and BMI (ß = -0.145, 95%CI -0.228 to -0.061, p = 0.002), with a quadratic relationship observed for both variables. CONCLUSIONS: Although the results of the current analysis did not indicate significant differences in PA behaviors between SMW and heterosexual women, age and BMI modify the association and are curvilinear in nature; such that smaller differences in PA were observed between SMW and heterosexual women when samples were middle-aged and overweight.

5.
Educ Psychol Meas ; 83(5): 953-983, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37663538

ABSTRACT

Rating scale analysis techniques provide researchers with practical tools for examining the degree to which ordinal rating scales (e.g., Likert-type scales or performance assessment rating scales) function in psychometrically useful ways. When rating scales function as expected, researchers can interpret ratings in the intended direction (i.e., lower ratings mean "less" of a construct than higher ratings), distinguish between categories in the scale (i.e., each category reflects a unique level of the construct), and compare ratings across elements of the measurement instrument, such as individual items. Although researchers have used these techniques in a variety of contexts, studies are limited that systematically explore their sensitivity to problematic rating scale characteristics (i.e., "rating scale malfunctioning"). I used a real data analysis and a simulation study to systematically explore the sensitivity of rating scale analysis techniques based on two popular polytomous item response theory (IRT) models: the partial credit model (PCM) and the generalized partial credit model (GPCM). Overall, results indicated that both models provide valuable information about rating scale threshold ordering and precision that can help researchers understand how their rating scales are functioning and identify areas for further investigation or revision. However, there were some differences between models in their sensitivity to rating scale malfunctioning in certain conditions. Implications for research and practice are discussed.

6.
J Intell ; 11(8)2023 Jul 31.
Article in English | MEDLINE | ID: mdl-37623535

ABSTRACT

Well-designed spatial assessments can incorporate multiple sources of complexity that reflect important aspects of spatial reasoning. When these aspects are systematically included in spatial reasoning items, researchers can use psychometric models to examine the impact of each aspect on item difficulty. These methods can then help the researchers to understand the nature and development of spatial reasoning and can also inform the development of new items to better reflect the construct. This study investigated sources of item difficulty for object assembly (OA), a format for the assessment of spatial reasoning, by specifying nine item characteristics that were predicted to contribute to item difficulty. We used data from two focal samples including high-ability students in grades 3 to 7 and undergraduate students who responded to 15 newly developed OA items. Results from the linear logistic test model (LLTM) indicated that eight of the nine identified item characteristics significantly contributed to item difficulty. This suggests that an LLTM approach is useful in examining the contributions of various aspects of spatial reasoning to item difficulty and informing item development for spatial reasoning assessments.

7.
J Acad Nutr Diet ; 123(12): 1713-1728, 2023 12.
Article in English | MEDLINE | ID: mdl-37429414

ABSTRACT

BACKGROUND: Challenging eating behaviors or feeding difficulties, commonly displayed in children with Down syndrome (DS), may amplify perceived stress in caregivers. If caregivers lack resources on how to accommodate the needs of the child with DS, they may find feeding the child stressful and resort to negative coping strategies. OBJECTIVE: The aim of this study was to understand the feeding stressors, resources, and coping strategies used by caregivers of children with DS. DESIGN: A qualitative analysis of interview transcripts was undertaken, framed around the Transactional Model of Stress and Coping. PARTICIPANTS/SETTING: Between September to November 2021, 15 caregivers of children (aged 2 through 6 years) with DS, were recruited from 5 states located in the Southeast, Southwest, and West regions of the United States. ANALYSIS: Interviews were audio-recorded, transcribed verbatim, and analyzed using deductive thematic analysis and content analysis approaches. RESULTS: Thirteen caregivers reported increased stress around feeding the child with DS. Stressors identified included concern about adequacy of intake and challenges associated with feeding difficulties. Stress related to feeding was higher among caregivers whose child was learning a new feeding skill or in a transitional phase of feeding. Caregivers used both professional and interpersonal resources in addition to problem- and emotion-based coping strategies. CONCLUSIONS: Caregivers identified feeding as a stressful event with higher stress reported during transitional phases of feeding. Caregivers reported that speech, occupational, and physical therapists were beneficial resources to provide support for optimizing nutrition and skill development. These findings suggest that caregiver access to therapists and registered dietitian nutritionists is warranted.


Subject(s)
Caregivers , Down Syndrome , Humans , Child , Adaptation, Psychological , Emotions
8.
LGBT Health ; 10(6): 471-479, 2023.
Article in English | MEDLINE | ID: mdl-37418567

ABSTRACT

Purpose: Medical mistrust is a barrier to health care utilization and is associated with suboptimal health outcomes. Research on mistrust among sexual minority men (SMM) is limited and largely focuses on Black SMM and HIV, with few studies assessing mistrust among SMM of other race/ethnicities. The purpose of this study was to examine differences in medical mistrust among SMM by race. Methods: From February 2018 to February 2019, a mixed-methods study examined the health-related beliefs and experiences of young SMM in New York City. The Group-Based Medical Mistrust Scale (GBMMS) was used to measure medical mistrust related to race, and a modified version of the scale assessed mistrust related to one's "sexual/gender minority" status (Group-Based Medical Mistrust Scale-Sexual/Gender Minority [GBMMS-SGM]). With an analytic sample of 183 cisgender SMM, a one-way multivariate analysis of variance was used to examine differences in GBMMS and GBMMS-SGM scores by race/ethnicity [Black, Latinx, White, "Another Racial Group(s)"]. Results: There were significantly different GBMMS scores by race, with participants of color reporting higher levels of race-based medical mistrust than White participants. This finding is supported by effect sizes ranging from moderate to large. Differences in GBMMS-SGM scores by race were borderline; however, the effect size for Black and White participants' GBMMS-SGM scores was moderate, indicating that higher GBMMS-SGM scores among Black participants is meaningful. Conclusion: Multilevel strategies should be used to earn the trust of minoritized populations, such as addressing both historical and ongoing discrimination, moving beyond implicit bias trainings, and strengthening the recruitment and retention of minoritized health care professionals.


Subject(s)
Sexual and Gender Minorities , Trust , Male , Humans , Ethnicity , Racial Groups , Sexual Behavior
9.
Int J Exerc Sci ; 16(2): 118-128, 2023.
Article in English | MEDLINE | ID: mdl-37114195

ABSTRACT

The purpose of this study was to investigate lower limb blood flow responses under varying blood flow restriction (BFR) pressures based on individualized limb occlusion pressures (LOP) using a commonly used occlusion device. Twenty-nine participants (65.5% female, 23.8 ± 4.7 years) volunteered for this study. An 11.5cm tourniquet was placed around participants' right proximal thigh, followed by an automated LOP measurement (207.1 ± 29.4mmHg). Doppler ultrasound was used to assess posterior tibial artery blood flow at rest, followed by 10% increments of LOP (10-90% LOP) in a randomized order. All data were collected during a single 90-minute laboratory visit. Friedman's and one-way repeated-measures ANOVAs were used to examine potential differences in vessel diameter, volumetric blood flow (VolFlow), and reduction in VolFlow relative to rest (%Rel) between relative pressures. No differences in vessel diameter were observed between rest and all relative pressures (all p < .05). Significant reductions from rest in VolFlow and %Rel were first observed at 50% LOP and 40% LOP, respectively. VolFlow at 80% LOP, a commonly used occlusion pressure in the legs, was not significantly different from 60% (p = .88), 70% (p = .20), or 90% (p = 1.00) LOP. Findings indicate a minimal threshold pressure of 50%LOP may be required to elicit a significant decrease in arterial blood flow at rest when utilizing the 11.5cm Delfi PTSII tourniquet system.

10.
Appl Psychol Meas ; 47(2): 91-105, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36875294

ABSTRACT

In standalone performance assessments, researchers have explored the influence of different rating designs on the sensitivity of latent trait model indicators to different rater effects as well as the impacts of different rating designs on student achievement estimates. However, the literature provides little guidance on the degree to which different rating designs might affect rater classification accuracy (severe/lenient) and rater measurement precision in both standalone performance assessments and mixed-format assessments. Using results from an analysis of National Assessment of Educational Progress (NAEP) data, we conducted simulation studies to systematically explore the impacts of different rating designs on rater measurement precision and rater classification accuracy (severe/lenient) in mixed-format assessments. The results suggest that the complete rating design produced the highest rater classification accuracy and greatest rater measurement precision, followed by the multiple-choice (MC) + spiral link design and the MC link design. Considering that complete rating designs are not practical in most testing situations, the MC + spiral link design may be a useful choice because it balances cost and performance. We consider the implications of our findings for research and practice.

11.
Article in English | MEDLINE | ID: mdl-36834246

ABSTRACT

This study examined the acute effects of high-intensity resistance exercise with blood flow restriction (BFR) on performance and fatigue, metabolic stress, and markers of inflammation (interleukin-6 (IL-6)), muscle damage (myoglobin), angiogenesis (vascular endothelial growth factor (VEGF)). Thirteen resistance-trained participants (four female, 24.8 ± 4.7 years) performed four sets of barbell back-squats (75% 1RM) to failure under two conditions: blood flow restriction (BFR, bilateral 80% occlusion pressure) and control (CTRL). Completed repetitions and pre-post-exercise changes in maximal voluntary isometric contractions, countermovement jump, barbell mean propulsive velocity, and surface electromyography were recorded. Pre-post blood lactate (BLa) and venous blood samples for analysis of IL-6, myoglobin, and VEGF were collected. Ratings of perceived exertion (RPE) and pain were recorded for each set. Fewer repetitions were performed during BFR (25.5 ± 9.6 reps) compared to CTRL (43.4 ± 14.2 reps, p < 0.001), with greater repetitions performed during sets 1, 2, and 4 (p < 0.05) in CTRL. Although RPE between conditions was similar across all sets (p > 0.05), pain was greater in BFR across all sets (p < 0.05). Post-exercise fatigue was comparable between conditions. BLa was significantly greater in CTRL compared to BFR at two minutes (p = 0.001) but not four minutes post-exercise (p = 0.063). IL-6 was significantly elevated following BFR (p = 0.011). Comparable increases in myoglobin (p > 0.05) and no changes in VEGF were observed (p > 0.05). BFR increases the rate of muscular fatigue during high-intensity resistance exercise and acutely enhances IL-6 response, with significantly less total work performed, but increases pain perception, limiting implementation.


Subject(s)
Resistance Training , Vascular Endothelial Growth Factor A , Female , Humans , Fatigue , Interleukin-6 , Muscle, Skeletal/physiology , Myoglobin , Pain , Regional Blood Flow/physiology , Male
12.
Behav Res Methods ; 55(7): 3370-3415, 2023 10.
Article in English | MEDLINE | ID: mdl-36131197

ABSTRACT

Careless responding is a pervasive issue that impacts the interpretation and use of responses from survey instruments. Researchers have proposed numerous useful methods for detecting carelessness in survey research, including relatively simple summary statistics such as the frequency of adjacent responses in the same category (e.g., "long-string" analysis) and outlier statistics (e.g., Mahalanobis distance). Researchers have also used methods based on item response theory (IRT) models to identify examinees whose response patterns are unexpected given item parameters. However, researchers have not fully considered the use of nonparametric IRT methods based on Mokken scale analysis (MSA) to detect carelessness in survey research. MSA is a promising framework in which to consider participant carelessness because it is well suited to contexts in which parametric IRT models may not be appropriate, while still maintaining a focus on fundamental measurement requirements. We used a real data analysis and a simulation study to examine the sensitivity of MSA indicators of response quality to examinee carelessness and compared the results to those from standalone indicators. We also examined the impact of carelessness on the sensitivity of MSA item quality indicators. Numeric and graphical indicators of response quality from MSA indicators were sensitive to examinee carelessness. Graphical displays of nonparametric person response functions (PRFs) provided supplementary insight that can alert researchers to potentially problematic responses. Our results also indicated that MSA indicators of item quality are robust to the presence of participant carelessness. We consider the implications of our findings for research and practice.


Subject(s)
Research Design , Humans , Psychometrics/methods , Computer Simulation , Surveys and Questionnaires
13.
Educ Psychol Meas ; 82(4): 747-756, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35754613

ABSTRACT

Researchers frequently use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, when they have relatively small samples of examinees. Researchers have provided some guidance regarding the minimum sample size for applications of MSA under various conditions. However, these studies have not focused on item-level measurement problems, such as violations of monotonicity or invariant item ordering (IIO). Moreover, these studies have focused on problems that occur for a complete sample of examinees. The current study uses a simulation study to consider the sensitivity of MSA item analysis procedures to problematic item characteristics that occur within limited ranges of the latent variable. Results generally support the use of MSA with small samples (N around 100 examinees) as long as multiple indicators of item quality are considered.

14.
Res Q Exerc Sport ; 93(2): 391-400, 2022 06.
Article in English | MEDLINE | ID: mdl-33300852

ABSTRACT

Purpose: The aim of this study was to compare the effects of low ([LV]; 4 total sets), moderate ([MV]; 8 total sets), and high set volumes ([HV]; 12 total sets) in acute full-body resistance exercise sessions on post-exercise parasympathetic reactivation measured using RMSSD. Methods: Ten resistance-trained participants (25.8 ± 6.8 yr., 173.4 ± 10.6 cm, 75.4 ± 9.9 kg) performed three resistance exercise sessions. During each session, heart rate variability (HRV) was measured pre- and for 30 min post-exercise, divided into 5-min segments stabilization, Post5-10, Post10-15, Post15-20, Post20-25, and Post25-30. Repeated-measures ANOVA was used to assess differences within and between pre-post exercise natural logarithm RMSSD (LnRMSSD) values. To assess the initial change in LnRMSSD, the delta percent change (ΔLnRMSSD) from pre-exercise to Post5-10 (ΔLnRMSSDpre-post) was calculated for each session. The ΔLnRMSSD was also calculated between Post5-10 and Post25-30 (ΔLnRMSSDpost5-30) to assess recovery. Results: Significant differences were observed between sessions and when comparing pre-exercise values to all post-exercise times across sessions (p ≤ .05). The LV session resulted in significantly higher mean LnRMSSD value (3.62) post-exercise compared to both the MV (3.11, effect size [ES] =  3.77) and HV (3.02, ES =  3.92) sessions while the MV and HV sessions produced similar responses. Across sessions no return to baseline occurred and when comparing sessions, no significant differences were found in ΔLnRMSSDpre-post or ΔLnRMSSDpost5-30. Conclusion: Acute bouts of full-body resistance exercise can cause similar reductions in LnRMSSD from pre-exercise levels and can delay parasympathetic reactivation back to baseline values during the same 30-min recovery period despite differences in set volume.


Subject(s)
Resistance Training , Exercise/physiology , Exercise Test , Heart Rate/physiology , Humans
15.
Educ Psychol Meas ; 81(5): 996-1022, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34565815

ABSTRACT

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for detecting rater biases, or differential rater functioning (DRF). The purpose of this study is to illustrate and explore the sensitivity of DRF indices in realistic sparse rating designs that have been documented in the literature that include different types and levels of connectivity among raters and students. The results indicated that it is possible to detect DRF in sparse rating designs, but the sensitivity of DRF indices varies across designs. We consider the implications of our findings for practice related to monitoring raters in performance assessments.

16.
Appl Psychol Meas ; 45(5): 315-330, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34565938

ABSTRACT

When analysts evaluate performance assessments, they often use modern measurement theory models to identify raters who frequently give ratings that are different from what would be expected, given the quality of the performance. To detect problematic scoring patterns, two rater fit statistics, the infit and outfit mean square error (MSE) statistics are routinely used. However, the interpretation of these statistics is not straightforward. A common practice is that researchers employ established rule-of-thumb critical values to interpret infit and outfit MSE statistics. Unfortunately, prior studies have shown that these rule-of-thumb values may not be appropriate in many empirical situations. Parametric bootstrapped critical values for infit and outfit MSE statistics provide a promising alternative approach to identifying item and person misfit in item response theory (IRT) analyses. However, researchers have not examined the performance of this approach for detecting rater misfit. In this study, we illustrate a bootstrap procedure that researchers can use to identify critical values for infit and outfit MSE statistics, and we used a simulation study to assess the false-positive and true-positive rates of these two statistics. We observed that the false-positive rates were highly inflated, and the true-positive rates were relatively low. Thus, we proposed an iterative parametric bootstrap procedure to overcome these limitations. The results indicated that using the iterative procedure to establish 95% critical values of infit and outfit MSE statistics had better-controlled false-positive rates and higher true-positive rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values.

17.
Educ Psychol Meas ; 81(2): 290-318, 2021 Apr.
Article in English | MEDLINE | ID: mdl-37929258

ABSTRACT

Researchers frequently use Rasch models to analyze survey responses because these models provide accurate parameter estimates for items and examinees when there are missing data. However, researchers have not fully considered how missing data affect the accuracy of dimensionality assessment in Rasch analyses such as principal components analysis (PCA) of standardized residuals. Because adherence to unidimensionality is a prerequisite for the appropriate interpretation and use of Rasch model results, insight into the impact of missing data on the accuracy of this approach is critical. We used a simulation study to examine the accuracy of standardized residual PCA with various proportions of missing data and multidimensionality. We also explored an adaptation of modified parallel analysis in combination with standardized residual PCA as a source of additional information about dimensionality when missing data are present. Our results suggested that missing data impact the accuracy of PCA on standardized residuals, and that the adaptation of modified parallel analysis provides useful supplementary information about dimensionality when there are missing data.

18.
Sensors (Basel) ; 20(20)2020 Oct 09.
Article in English | MEDLINE | ID: mdl-33050249

ABSTRACT

The aim was to examine the validity of heart rate variability (HRV) measurements from photoplethysmography (PPG) via a smartphone application pre- and post-resistance exercise (RE) and to examine the intraday and interday reliability of the smartphone PPG method. Thirty-one adults underwent two simultaneous ultrashort-term electrocardiograph (ECG) and PPG measurements followed by 1-repetition maximum testing for back squats, bench presses, and bent-over rows. The participants then performed RE, where simultaneous ultrashort-term ECG and PPG measurements were taken: two pre- and one post-exercise. The natural logarithm of the root mean square of successive normal-to-normal (R-R) differences (LnRMSSD) values were compared with paired-sample t-tests, Pearson product correlations, Cohen's d effect sizes (ESs), and Bland-Altman analysis. Intra-class correlations (ICC) were determined between PPG LnRMSSDs. Significant, small-moderate differences were found for all measurements between ECG and PPG: BasePre1 (ES = 0.42), BasePre2 (0.30), REPre1 (0.26), REPre2 (0.36), and REPost (1.14). The correlations ranged from moderate to very large: BasePre1 (r = 0.59), BasePre2 (r = 0.63), REPre1 (r = 0.63), REPre2 (r = 0.76), and REPost (r = 0.41)-all p < 0.05. The agreement for all the measurements was "moderate" (0.10-0.16). The PPG LnRMSSD exhibited "nearly-perfect" intraday reliability (ICC = 0.91) and "very large" interday reliability (0.88). The smartphone PPG was comparable to the ECG for measuring HRV at rest, but with larger error after resistance exercise.


Subject(s)
Heart Rate , Resistance Training , Smartphone , Adult , Electrocardiography , Humans , Photoplethysmography , Reproducibility of Results
19.
J Appl Meas ; 21(3): 260-270, 2020.
Article in English | MEDLINE | ID: mdl-33983898

ABSTRACT

Researchers and practitioners have used the Modern Language Aptitude Test (MLAT) to assess language aptitude and identify possible language learning deficiencies in examinees since the 1950s. However, researchers have not assessed its psychometric properties using modern measurement theory methods. We use the dichotomous Rasch model to explore the psychometric properties of the MLAT, including data-model fit indices, item difficulty and student ability calibrations, reliability of separation, and differences in achievement across gender subgroups based on a sample of undergraduate and graduate university students (N=204). Our findings suggest that the MLAT has acceptable psychometric properties such that it can be meaningfully interpreted as a measure of language proficiency. Our findings confirm previous research that language performance across gender groups significantly differs. We found no significant interactions between gender subgroups and the difficulty of the five domains of the assessment. We discuss these results in terms of their implications for research and practice.


Subject(s)
Aptitude Tests , Aptitude , Language , Psychometrics , Humans , Reproducibility of Results
20.
J Appl Meas ; 21(3): 313-328, 2020.
Article in English | MEDLINE | ID: mdl-33983902

ABSTRACT

In previous studies, researchers have focused on the development and interpretation of measurement tools related to self-efficacy. However, researchers have seldom investigated whether these instruments demonstrate acceptable psychometric properties, including similar item interpretations between subgroups of respondents. The purpose of this study was to explore the extent to which a self-efficacy measure has a consistent interpretation for two self-reported gender subgroups. The researchers utilized Rasch analysis to explore differences in item difficulty between the subgroups. Results suggested differences in item difficulty ordering for certain self-efficacy items. Implications for research and practice are discussed.


Subject(s)
Psychometrics , Self Efficacy , Students , Humans , Schools , Surveys and Questionnaires
SELECTION OF CITATIONS
SEARCH DETAIL
...