Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
1.
Psychometrika ; 89(1): 42-63, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38573434

ABSTRACT

Many studies in fields such as psychology and educational sciences obtain information about attributes of subjects through observational studies, in which raters score subjects using multiple-item rating scales. Error variance due to measurement effects, such as items and raters, attenuate the regression coefficients and lower the power of (hierarchical) linear models. A modeling procedure is discussed to reduce the attenuation. The procedure consists of (1) an item response theory (IRT) model to map the discrete item responses to a continuous latent scale and (2) a generalizability theory (GT) model to separate the variance in the latent measurement into variance components of interest and nuisance variance components. It will be shown how measurements obtained from this mixture of IRT and GT models can be embedded in (hierarchical) linear models, both as predictor or criterion variables, such that error variance due to nuisance effects are partialled out. Using examples from the field of educational measurement, it is shown how general-purpose software can be used to implement the modeling procedure.


Subject(s)
Psychometrics , Psychometrics/methods , Humans , Regression Analysis , Models, Statistical , Bias , Educational Measurement/methods , Linear Models
2.
Front Psychol ; 10: 2358, 2019.
Article in English | MEDLINE | ID: mdl-31695647

ABSTRACT

This article introduces a new hybrid intake procedure developed for posttraumatic stress disorder (PTSD) screening, which combines an automated textual assessment of respondents' self-narratives and item-based measures that are administered consequently. Text mining technique and item response modeling were used to analyze long constructed response (i.e., self-narratives) and responses to standardized questionnaires (i.e., multiple choices), respectively. The whole procedure is combined in a Bayesian framework where the textual assessment functions as prior information for the estimation of the PTSD latent trait. The purpose of this study is twofold: first, to investigate whether the combination model of textual analysis and item-based scaling could enhance the classification accuracy of PTSD, and second, to examine whether the standard error of estimates could be reduced through the use of the narrative as a sort of routing test. With the sample at hand, the combination model resulted in a reduction in the misclassification rate, as well as a decrease of standard error of latent trait estimation. These findings highlight the benefits of combining textual assessment and item-based measures in a psychiatric screening process. We conclude that the hybrid test design is a promising approach to increase test efficiency and is expected to be applicable in a broader scope of educational and psychological measurement in the future.

3.
Multivariate Behav Res ; 53(6): 914-924, 2018.
Article in English | MEDLINE | ID: mdl-30463444

ABSTRACT

A method is proposed for constructing indices as linear functions of variables such that the reliability of the compound score is maximized. Reliability is defined in the framework of latent variable modeling [i.e., item response theory (IRT)] and optimal weights of the components of the index are found by maximizing the posterior variance relative to the total latent variable variance. Three methods for estimating the weights are proposed. The first is a likelihood-based approach, that is, marginal maximum likelihood (MML). The other two are Bayesian approaches based on Markov chain Monte Carlo (MCMC) computational methods. One is based on an augmented Gibbs sampler specifically targeted at IRT, and the other is based on a general purpose Gibbs sampler such as implemented in OpenBugs and Jags. Simulation studies are presented to demonstrate the procedure and to compare the three methods. Results are very similar, so practitioners may be suggested the use of the easily accessible latter method. A real-data set pertaining to the 28-joint Disease Activity Score is used to show how the methods can be applied in a complex measurement situation with multiple time points and mixed data formats.


Subject(s)
Bayes Theorem , Likelihood Functions , Monte Carlo Method , Humans , Markov Chains
4.
Appl Psychol Meas ; 42(5): 327-342, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29962559

ABSTRACT

As there is currently a marked increase in the use of both unidimensional (UCAT) and multidimensional computerized adaptive testing (MCAT) in psychological and health measurement, the main aim of the present study is to assess the incremental value of using MCAT rather than separate UCATs for each dimension. Simulations are based on empirical data that could be considered typical for health measurement: a large number of dimensions (4), strong correlations among dimensions (.77-.87), and polytomously scored response data. Both variable- (SE < .316, SE < .387) and fixed-length conditions (total test length of 12, 20, or 32 items) are studied. The item parameters and variance-covariance matrix Φ are estimated with the multidimensional graded response model (GRM). Outcome variables include computerized adaptive test (CAT) length, root mean square error (RMSE), and bias. Both simulated and empirical latent trait distributions are used to sample vectors of true scores. MCATs were generally more efficient (in terms of test length) and more accurate (in terms of RMSE) than their UCAT counterparts. Absolute average bias was highest for variable-length UCATs with termination rule SE < .387. Test length of variable-length MCATs was on average 20% to 25% shorter than test length across separate UCATs. This study showed that there are clear advantages of using MCAT rather than UCAT in a setting typical for health measurement.

5.
PLoS One ; 12(1): e0169787, 2017.
Article in English | MEDLINE | ID: mdl-28076429

ABSTRACT

The Single Variable Exchange algorithm is based on a simple idea; any model that can be simulated can be estimated by producing draws from the posterior distribution. We build on this simple idea by framing the Exchange algorithm as a mixture of Metropolis transition kernels and propose strategies that automatically select the more efficient transition kernels. In this manner we achieve significant improvements in convergence rate and autocorrelation of the Markov chain without relying on more than being able to simulate from the model. Our focus will be on statistical models in the Exponential Family and use two simple models from educational measurement to illustrate the contribution.


Subject(s)
Algorithms , Computer Simulation/statistics & numerical data , Models, Theoretical
6.
Assessment ; 24(2): 157-172, 2017 Mar.
Article in English | MEDLINE | ID: mdl-26358713

ABSTRACT

Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.


Subject(s)
Data Mining , Diagnosis, Computer-Assisted , Mass Screening , Narration , Natural Language Processing , Self Report , Stress Disorders, Post-Traumatic , Adolescent , Adult , Algorithms , Decision Trees , Early Diagnosis , Female , Humans , Personality Assessment/statistics & numerical data , Reproducibility of Results , Stress Disorders, Post-Traumatic/classification , Stress Disorders, Post-Traumatic/diagnosis , Stress Disorders, Post-Traumatic/psychology
7.
Psychometrika ; 81(2): 274-89, 2016 06.
Article in English | MEDLINE | ID: mdl-27052959

ABSTRACT

In this paper, we show that the marginal distribution of plausible values is a consistent estimator of the true latent variable distribution, and, furthermore, that convergence is monotone in an embedding in which the number of items tends to infinity. We use this result to clarify some of the misconceptions that exist about plausible values, and also show how they can be used in the analyses of educational surveys.


Subject(s)
Psychometrics , Statistics as Topic , Bayes Theorem , Educational Measurement , Humans , Models, Theoretical , Surveys and Questionnaires
8.
PLoS One ; 10(12): e0145008, 2015.
Article in English | MEDLINE | ID: mdl-26710104

ABSTRACT

OBJECTIVE: Multidimensional computerized adaptive testing enables precise measurements of patient-reported outcomes at an individual level across different dimensions. This study examined the construct validity of a multidimensional computerized adaptive test (CAT) for fatigue in rheumatoid arthritis (RA). METHODS: The 'CAT Fatigue RA' was constructed based on a previously calibrated item bank. It contains 196 items and three dimensions: 'severity', 'impact' and 'variability' of fatigue. The CAT was administered to 166 patients with RA. They also completed a traditional, multidimensional fatigue questionnaire (BRAF-MDQ) and the SF-36 in order to examine the CAT's construct validity. A priori criterion for construct validity was that 75% of the correlations between the CAT dimensions and the subscales of the other questionnaires were as expected. Furthermore, comprehensive use of the item bank, measurement precision and score distribution were investigated. RESULTS: The a priori criterion for construct validity was supported for two of the three CAT dimensions (severity and impact but not for variability). For severity and impact, 87% of the correlations with the subscales of the well-established questionnaires were as expected but for variability, 53% of the hypothesised relations were found. Eighty-nine percent of the items were selected between one and 137 times for CAT administrations. Measurement precision was excellent for the severity and impact dimensions, with more than 90% of the CAT administrations reaching a standard error below 0.32. The variability dimension showed good measurement precision with 90% of the CAT administrations reaching a standard error below 0.44. No floor- or ceiling-effects were found for the three dimensions. CONCLUSION: The CAT Fatigue RA showed good construct validity and excellent measurement precision on the dimensions severity and impact. The dimension variability had less ideal measurement characteristics, pointing to the need to recalibrate the CAT item bank with a two-dimensional model, solely consisting of severity and impact.


Subject(s)
Arthritis, Rheumatoid/pathology , Fatigue/diagnosis , Psychometrics/methods , Self Report , Adult , Aged , Aged, 80 and over , Computers , Female , Humans , Male , Middle Aged , Reproducibility of Results , Surveys and Questionnaires , Young Adult
9.
Rheumatology (Oxford) ; 54(12): 2221-9, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26224306

ABSTRACT

OBJECTIVE: To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). METHODS: The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. RESULTS: All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. CONCLUSION: The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA.


Subject(s)
Arthritis, Rheumatoid/physiopathology , Motor Activity/physiology , Patient Outcome Assessment , Activities of Daily Living , Adult , Aged , Arthritis, Rheumatoid/rehabilitation , Disability Evaluation , Female , Humans , Male , Middle Aged , Reproducibility of Results , Severity of Illness Index
10.
Health Qual Life Outcomes ; 13: 23, 2015 Feb 21.
Article in English | MEDLINE | ID: mdl-25890307

ABSTRACT

BACKGROUND: This paper demonstrates the mechanism of a multidimensional computerized adaptive test (CAT) to measure fatigue in patients with rheumatoid arthritis (RA). A CAT can be used to precisely measure patient-reported outcomes at an individual level as items are consequentially selected based on the patient's previous answers. The item bank of the CAT Fatigue RA has been developed from the patients' perspective and consists of 196 items pertaining to three fatigue dimensions: severity, impact and variability of fatigue. METHODS: The CAT Fatigue RA was completed by fifteen patients. To test the CAT's working mechanism, we applied the flowchart-check-method. The adaptive item selection procedure for each patient was checked by the researchers. The estimated fatigue levels and the measurement precision per dimension were illustrated with the selected items, answers and flowcharts. RESULTS: The CAT Fatigue RA selected all items in a logical sequence and those items were selected which provided the most information about the patient's individual fatigue. Flowcharts further illustrated that the CAT reached a satisfactory measurement precision, with less than 20 items, on the dimensions severity and impact and to somewhat lesser extent also for the dimension variability. Patients' fatigue scores varied across the three dimensions; sometimes severity scored highest, other times impact or variability. The CAT's ability to display different fatigue experiences can improve communication in daily clinical practice, guide interventions, and facilitate research into possible predictors of fatigue. CONCLUSIONS: The results indicate that the CAT Fatigue RA measures precise and comprehensive. Once it is examined in more detail in a consecutive, elaborate validation study, the CAT will be available for implementation in daily clinical practice and for research purposes.


Subject(s)
Arthritis, Rheumatoid/complications , Diagnosis, Computer-Assisted/methods , Fatigue/diagnosis , Quality of Life , Adult , Aged , Arthritis, Rheumatoid/psychology , Fatigue/etiology , Female , Health Status Indicators , Humans , Male , Middle Aged
11.
Sci Rep ; 5: 9050, 2015 Mar 12.
Article in English | MEDLINE | ID: mdl-25761415

ABSTRACT

Estimating the structure of Ising networks is a notoriously difficult problem. We demonstrate that using a latent variable representation of the Ising network, we can employ a full-data-information approach to uncover the network structure. Thereby, only ignoring information encoded in the prior distribution (of the latent variables). The full-data-information approach avoids having to compute the partition function and is thus computationally feasible, even for networks with many nodes. We illustrate the full-data-information approach with the estimation of dense networks.


Subject(s)
Algorithms , Models, Theoretical
12.
J Rheumatol ; 42(3): 413-20, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25593225

ABSTRACT

OBJECTIVE: To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, and single-item measures of fatigue in patients with rheumatoid arthritis (RA). METHODS: Confirmatory factor analysis (CFA) and longitudinal item response theory (IRT) modeling were used to evaluate the measurement structure and local reliability of the Bristol RA Fatigue Multi-Dimensional Questionnaire (BRAF-MDQ), the Medical Outcomes Study Short Form-36 (SF-36) vitality scale, and the BRAF Numerical Rating Scales (BRAF-NRS) in a sample of 588 patients with RA. RESULTS: A 1-factor CFA model yielded a similar fit to a 5-factor model with subscale-specific dimensions, and the items from the different instruments adequately fit the IRT model, suggesting essential unidimensionality in measurement. The SF-36 vitality scale outperformed the BRAF-MDQ at lower levels of fatigue, but was less precise at moderate to higher levels of fatigue. At these levels of fatigue, the living, cognition, and emotion subscales of the BRAF-MDQ provide additional precision. The BRAF-NRS showed a limited measurement range with its highest precision centered on average levels of fatigue. CONCLUSION: The different instruments appear to access a common underlying domain of fatigue severity, but differ considerably in their measurement precision along the continuum. The SF-36 vitality scale can be used to measure fatigue severity in samples with relatively mild fatigue. For samples expected to have higher levels of fatigue, the multidimensional BRAF-MDQ appears to be a better choice. The BRAF-NRS are not recommended if precise assessment is required, for instance in longitudinal settings.


Subject(s)
Arthritis, Rheumatoid/complications , Fatigue/diagnosis , Aged , Arthritis, Rheumatoid/psychology , Fatigue/complications , Fatigue/psychology , Female , Humans , Male , Middle Aged , Psychometrics , Reproducibility of Results , Severity of Illness Index , Surveys and Questionnaires
13.
Qual Life Res ; 24(1): 67-79, 2015 Jan.
Article in English | MEDLINE | ID: mdl-24241770

ABSTRACT

PURPOSE: The St George's Respiratory Questionnaire (SGRQ) has clearly acquired the status of legacy questionnaire for measuring health-related quality of life in patients with chronic obstructive pulmonary disease (COPD). The main aim of this study was to assess the underlying dimensionality of the SGRQ and to investigate the added value of the empirical weights used to calculate total scores. METHODS: The official Dutch translation of the SGRQ was completed by 444 COPD patients participating in two clinical studies. These data were used for secondary data analysis in this study. Three complementary statistical methods were used to assess dimensionality: Mokken scale analysis (MSA), parametric multidimensional item response theory (IRT) and bifactor analysis. Additionally, the original SGRQ weighting procedure was compared to IRT-based weighting. RESULTS: The results of the MSA and multidimensional item response theory (MIRT) pointed toward a unidimensional structure. The bifactor analyses indicated that there was a strong general factor, but the group factors did have additional value. Nineteen items performed poorly in the MSA, MIRT analysis or both. Shortening the scale from 50 to 31 items did not negatively impact measurement precision. SGRQ total score and IRT-derived scores correlated strongly, 0.90 for the one-parameter model and 0.99 for the two-parameter model. CONCLUSION: The SGRQ contains some multidimensionality, but an abbreviated version can be used as a unidimensional tool in patients with COPD. Subscale scores should be used with care. SGRQ total scores correlated highly with IRT-based scores, and thus, the weighting methods may be used interchangeably to calculate total scores.


Subject(s)
Health Status , Pulmonary Disease, Chronic Obstructive/physiopathology , Quality of Life , Surveys and Questionnaires , Aged , Female , Humans , Male , Middle Aged , Models, Statistical , Psychometrics , Translations
14.
BMC Musculoskelet Disord ; 15: 368, 2014 Nov 06.
Article in English | MEDLINE | ID: mdl-25373740

ABSTRACT

BACKGROUND: The erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) are two commonly used measures of inflammation in rheumatoid arthritis (RA). As current RA treatment guidelines strongly emphasize early and aggressive treatment aiming at fast remission, optimal measurement of inflammation becomes increasingly important. Dependencies with age, sex, and body mass index have been shown for both inflammatory markers, yet it remains unclear which inflammatory marker is affected least by these effects in patients with early RA. METHODS: Baseline data from 589 patients from the DREAM registry were used for analyses. Associations between the inflammatory markers and age, sex, and BMI were evaluated first using univariate linear regression analyses. Next, it was tested whether these associations were independent of a patient's current disease activity as well as of each other using multiple linear regression analyses with backward elimination. The strengths of the associations were compared using standardized beta (ß) coefficients. The multivariate analyses were repeated after 1 year. RESULTS: At baseline, both the ESR and CRP were univariately associated with age, sex, and BMI, although the association with BMI disappeared in multivariate analyses. ESR and CRP levels significantly increased with age (ß-ESR=0.017, p<0.001 and ß-CRP=0.009, p=0.006), independent of the number of tender and swollen joints, general health, and sex. For each decade of aging, ESR and CRP levels became 1.19 and 1.09 times higher, respectively. Furthermore, women demonstrated average ESR levels that were 1.22 times higher than that of men (ß=0.198, p=0.007), whereas men had 1.20 times higher CRP levels (ß=-0.182, p=0.048). Effects were strongest on the ESR. BMI became significantly associated with both inflammatory markers after 1 year, showing higher levels with increasing weight. Age continued to be significantly associated, whereas sex remained only associated with the ESR level. CONCLUSIONS: Age and sex are independently associated with the levels of both acute phase reactants in early RA, emphasizing the need to take these external factors into account when interpreting disease activity measures. BMI appears to become more relevant at later stages of the disease.


Subject(s)
Aging/blood , Arthritis, Rheumatoid/blood , Arthritis, Rheumatoid/diagnosis , C-Reactive Protein/metabolism , Erythrocytes/metabolism , Sex Characteristics , Adult , Aged , Aging/pathology , Blood Sedimentation , Cohort Studies , Female , Humans , Male , Middle Aged
15.
Arthritis Rheumatol ; 66(10): 2900-8, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24964773

ABSTRACT

OBJECTIVE: To evaluate and compare the measurement precision and sensitivity to change of the Health Assessment Questionnaire disability index (HAQ DI), the Short Form 36 physical functioning scale (PF-10), and simulated Patient-Reported Outcomes Measurement Information System (PROMIS) physical function computer adaptive tests (CATs) with 5, 10, and 15 items, using item response theory-based simulation studies. METHODS: The measurement precision of the various physical function instruments was evaluated by calculating root mean square errors (RMSEs) between true physical function levels (latent physical function score) and estimated physical function levels. Measurement precision was evaluated at 9 levels of physical function, with 5,000 simulated response patterns per level. Sensitivity to change was evaluated by the ability of a simple statistical test to detect simulated change scores of small to moderate magnitude (standardized effect sizes 0.20, 0.35, and 0.50). RESULTS: RMSEs were smaller for the PROMIS physical function 15-item CAT (CAT-15) and CAT-10 than for the HAQ DI and PF-10 across all levels of the latent physical function scale. Only marginal improvement in performance was observed for the CAT-15 compared with the CAT-10, and the CAT-5 performed quite similarly to the HAQ DI and PF-10 across most levels of the latent physical function scale. Substantially improved sensitivity to change was observed for the CAT-10 compared with the HAQ DI and PF-10, particularly in detecting moderate effect sizes. CONCLUSION: Clearly higher measurement precision was observed for the PROMIS CAT compared with the HAQ DI and PF-10. Higher reliability also translated into lower sample size requirements for detecting changes in clinical status.


Subject(s)
Arthritis, Rheumatoid/physiopathology , Disability Evaluation , Patient Outcome Assessment , Surveys and Questionnaires , Disabled Persons , Humans , Psychometrics , Reproducibility of Results
16.
PLoS One ; 9(6): e100544, 2014.
Article in English | MEDLINE | ID: mdl-24955759

ABSTRACT

BACKGROUND: The 28-joint Disease Activity Score (DAS28) combines scores on a 28-tender and swollen joint count (TJC28 and SJC28), a patient-reported measure for general health (GH), and an inflammatory marker (either the erythrocyte sedimentation rate [ESR] or the C-reactive protein [CRP]) into a composite measure of disease activity in rheumatoid arthritis (RA). This study examined the reliability of the DAS28 in patients with early RA using principles from generalizability theory and evaluated whether it could be increased by adjusting individual DAS28 component weights. METHODS: Patients were drawn from the DREAM registry and classified into a "fast response" group (N = 466) and "slow response" group (N = 80), depending on their pace of reaching remission. Composite reliabilities of the DAS28-ESR and DAS28-CRP were determined with the individual components' reliability, weights, variances, error variances, correlations and covariances. Weight optimization was performed by minimizing the error variance of the index. RESULTS: Composite reliabilities of 0.85 and 0.86 were found for the DAS28-ESR and DAS28-CRP, respectively, and were approximately equal across patients groups. Component reliabilities, however, varied widely both within and between sub-groups, ranging from 0.614 for GH ("slow response" group) to 0.912 for ESR ("fast response" group). Weight optimization increased composite reliability even further. In the total and "fast response" groups, this was achieved mostly by decreasing the weight of the TJC28 and GH. In the "slow response" group, though, the weights of the TJC28 and SJC28 were increased, while those of the inflammatory markers and GH were substantially decreased. CONCLUSIONS: The DAS28-ESR and the DAS28-CRP are reliable instruments for assessing disease activity in early RA and reliability can be increased even further by adjusting component weights. Given the low reliability and weightings of the general health component across subgroups it is recommended to explore alternative patient-reported outcome measures for inclusion in the DAS28.


Subject(s)
Arthritis, Rheumatoid/diagnosis , Research Design/standards , Severity of Illness Index , Cohort Studies , Disability Evaluation , Disease Progression , Female , Follow-Up Studies , Humans , Male , Middle Aged , Prognosis
17.
Arthritis Rheumatol ; 66(5): 1378-87, 2014 May.
Article in English | MEDLINE | ID: mdl-24782194

ABSTRACT

OBJECTIVE: To improve the assessment of physical function by enhancing precision of physical function assessment as it pertains to subjects at extreme ends of the health continuum (i.e., subjects with extremely poor function ["floor"] or extremely good health ["ceiling"]). METHODS: Under the Patient-Reported Outcomes Measurement Information System (PROMIS) (a National Institutes of Health initiative), we developed new items to assess floor and ceiling physical function in order to supplement the existing item bank. Using item response theory and standard PROMIS methodology, we developed 31 floor items and 31 ceiling items and administered the items during a 12-month prospective, observational study of 737 subjects whose health status was at either extreme. Effect size was calculated and change over time was compared across anchor instruments and across items. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. RESULTS: We studied 444 subjects who had been diagnosed as having a chronic illness and/or were of old age and 293 generally fit subjects (including athletes in training). Item response theory analyses confirmed that the new floor and ceiling items outperformed reference items (P < 0.001). The estimated post hoc sample size requirements were reduced by a factor of 2-4 for the floor population and a factor of 2 for the ceiling population. CONCLUSION: Extending the range of items by which physical function is measured can substantially improve measurement quality, reduce sample size requirements, and improve research efficiency. The paradigm shift from assessing disability to assessing physical function focuses assessment on the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health.


Subject(s)
Chronic Disease , Disability Evaluation , Motor Activity/physiology , Activities of Daily Living , Athletes , Female , Humans , Longitudinal Studies , Male , Middle Aged , Patient Outcome Assessment , Prospective Studies
18.
Arthritis Care Res (Hoboken) ; 66(11): 1754-8, 2014 Nov.
Article in English | MEDLINE | ID: mdl-24757106

ABSTRACT

OBJECTIVE: To evaluate the reliability of a crosswalk, developed in The Netherlands, between the Health Assessment Questionnaire (HAQ) disability index (DI) and the Short Form 36 physical functioning scale (PF-10) in a sample of patients with various rheumatic diseases in the US. METHODS: Baseline data from patients with rheumatoid arthritis (RA; n = 29,020), fibromyalgia (FM; n = 3,776), and systemic lupus erythematosus (SLE; n = 1,609) participating in the National Data Bank for Rheumatic Diseases were analyzed. Reliability of the crosswalk was evaluated by calculating intraclass correlation coefficients (ICCs), and agreement between observed and predicted scores was evaluated using the Bland-Altman approach. RESULTS. The crosswalk produced reliable conversions for both the HAQ DI (ICC range 0.70-0.77) and PF-10 (ICC range 0.73-0.78) in all 3 disease groups. The mean difference between observed and expected scores was close to zero in US patients with RA. For all 3 disease groups, the limits of agreement were fairly wide and conversion at the level of individual patients is not recommended. CONCLUSION: The crosswalk produced reliable conversions at the group level in a crosscultural setting and can be used to convert HAQ DI to PF-10 scores and vice versa in US patients with RA, FM, or SLE.


Subject(s)
Disability Evaluation , Mathematics/methods , Patient Outcome Assessment , Rheumatic Diseases/diagnosis , Rheumatic Diseases/physiopathology , Severity of Illness Index , Surveys and Questionnaires , Adult , Aged , Arthritis, Rheumatoid/diagnosis , Arthritis, Rheumatoid/physiopathology , Cohort Studies , Cross-Cultural Comparison , Female , Fibromyalgia/diagnosis , Fibromyalgia/physiopathology , Humans , Lupus Erythematosus, Systemic/diagnosis , Lupus Erythematosus, Systemic/physiopathology , Male , Middle Aged , Netherlands , Reproducibility of Results , United States
19.
PLoS One ; 9(3): e92367, 2014.
Article in English | MEDLINE | ID: mdl-24637885

ABSTRACT

OBJECTIVE: To calibrate the Dutch-Flemish version of the PROMIS physical function (PF) item bank in patients with rheumatoid arthritis (RA) and to evaluate cross-cultural measurement equivalence with US general population and RA data. METHODS: Data were collected from RA patients enrolled in the Dutch DREAM registry. An incomplete longitudinal anchored design was used where patients completed all 121 items of the item bank over the course of three waves of data collection. Item responses were fit to a generalized partial credit model adapted for longitudinal data and the item parameters were examined for differential item functioning (DIF) across country, age, and sex. RESULTS: In total, 690 patients participated in the study at time point 1 (T2, N = 489; T3, N = 311). The item bank could be successfully fitted to a generalized partial credit model, with the number of misfitting items falling within acceptable limits. Seven items demonstrated DIF for sex, while 5 items showed DIF for age in the Dutch RA sample. Twenty-five (20%) items were flagged for cross-cultural DIF compared to the US general population. However, the impact of observed DIF on total physical function estimates was negligible. DISCUSSION: The results of this study showed that the PROMIS PF item bank adequately fit a unidimensional IRT model which provides support for applications that require invariant estimates of physical function, such as computer adaptive testing and targeted short forms. More studies are needed to further investigate the cross-cultural applicability of the US-based PROMIS calibration and standardized metric.


Subject(s)
Arthritis, Rheumatoid/diagnosis , Arthritis, Rheumatoid/physiopathology , Surveys and Questionnaires , Calibration , Cross-Cultural Comparison , Female , Humans , Male , Middle Aged , Netherlands , United States
20.
Int J Methods Psychiatr Res ; 23(2): 131-41, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24436035

ABSTRACT

This article explores the generalizability of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) diagnostic criteria for post-traumatic stress disorder (PTSD) to various subpopulations. Besides identifying the differential symptom functioning (also referred to as differential item functioning [DIF]) related to various background variables such as gender, marital status and educational level, this study emphasizes the importance of evaluating the impact of DIF on population inferences as made in health surveys and clinical trials, and on the diagnosis of individual patients. Using a sample from the National Comorbidity Study-Replication (NCS-R), four symptoms for gender, one symptom for marital status, and three symptoms for educational level were significantly flagged as DIF, but their impact on diagnosis was fairly small. We conclude that the DSM-IV diagnostic criteria for PTSD do not produce substantially biased results in the investigated subpopulations, and there should be few reservations regarding their use. Further, although the impact of DIF (i.e. the influence of differential symptom functioning on diagnostic results) was found to be quite small in the current study, we recommend that diagnosticians always perform a DIF analysis of various subpopulations using the methodology presented here to ensure the diagnostic criteria is valid in their own studies.


Subject(s)
Stress Disorders, Post-Traumatic/diagnosis , Stress Disorders, Post-Traumatic/physiopathology , Adult , Comorbidity , Diagnostic and Statistical Manual of Mental Disorders , Educational Status , Female , Humans , Male , Marital Status , Middle Aged , Sex Factors , Stress Disorders, Post-Traumatic/psychology
SELECTION OF CITATIONS
SEARCH DETAIL
...