Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
Appl Psychol Meas ; 46(7): 551-570, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36131841

ABSTRACT

Adaptive classification testing (ACT) is a variation of computerized adaptive testing (CAT) that is developed to efficiently classify examinees into multiple groups based on predetermined cutoffs. In multidimensional multiclassification (i.e., more than two categories exist along each dimension), grid classification is proposed to classify each examinee into one of the grids encircled by cutoffs (lines/surfaces) along different dimensions so as to provide clearer information regarding an examinee's relative standing along each dimension and facilitate subsequent treatment and intervention. In this article, the sequential probability ratio test (SPRT) and confidence interval method were implemented in the grid multiclassification ACT. In addition, two new termination criteria, the grid classification generalized likelihood ratio (GGLR) and simplified grid classification generalized likelihood ratio were proposed for grid multiclassification ACT. Simulation studies, using a simulated item bank, and a real item bank with polytomous multidimensional items, show that grid multiclassification ACT is more efficient than classification based on measurement CAT that focuses on trait estimate precision. In the context of a high-quality bank, GGLR was found to most efficiently terminate the grid multiclassification ACT and classify examinees.

2.
Educ Psychol Meas ; 82(4): 643-677, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35754618

ABSTRACT

Adaptive measurement of change (AMC) is a psychometric method for measuring intra-individual change on one or more latent traits across testing occasions. Three hypothesis tests-a Z test, likelihood ratio test, and score ratio index-have demonstrated desirable statistical properties in this context, including low false positive rates and high true positive rates. However, the extant AMC research has assumed that the item parameter values in the simulated item banks were devoid of estimation error. This assumption is unrealistic for applied testing settings, where item parameters are estimated from a calibration sample before test administration. Using Monte Carlo simulation, this study evaluated the robustness of the common AMC hypothesis tests to the presence of item parameter estimation error when measuring omnibus change across four testing occasions. Results indicated that item parameter estimation error had at most a small effect on false positive rates and latent trait change recovery, and these effects were largely explained by the computerized adaptive testing item bank information functions. Differences in AMC performance as a function of item parameter estimation error and choice of hypothesis test were generally limited to simulees with particularly low or high latent trait values, where the item bank provided relatively lower information. These simulations highlight how AMC can accurately measure intra-individual change in the presence of item parameter estimation error when paired with an informative item bank. Limitations and future directions for AMC research are discussed.

3.
Arch Phys Med Rehabil ; 103(5S): S3-S14, 2022 05.
Article in English | MEDLINE | ID: mdl-35090886

ABSTRACT

OBJECTIVE: To develop and evaluate an efficient and precise variable-length functional assessment of applied cognition, daily activity, and mobility to inform mobility preservation and rehabilitation service delivery among hospitalized patients. DESIGN: A multidimensional item bank tapping into these dimensions was developed, with all items calibrated using a multidimensional graded response model. The items were adaptively selected from the item banks to maximize the test information, and the test ended when a joint stopping rule was satisfied. A simulation study was conducted based on the completed instrument, the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT), to compare its measurement precision and efficiency capabilities relative to conventional unidimensional computerized adaptive testing. Precision was measured by the bias and root mean squared error between the estimated and true (ie, simulated) θ estimates, whereas efficiency was measured by average test length. Data were collected by an interviewer reading questions from a tablet computer and entering patients' responses. SETTING: A large Midwestern hospital. PARTICIPANTS: A total of 4143 patients hospitalized with medical diagnosis and/or surgical complications, with 2060 in the calibration sample and 2083 in the validation cohort. INTERVENTION: Not applicable. RESULTS: Among the 2083 patients in the validation sample, FAMCAT administration required an average of 6 (SD=3.11) minutes. Ninety-six percent had their tests terminated by the standard error rule after responding to an average of 22.05 (SD=7.98) items, whereas 15 were terminated by the change in θ rule, with an average test length of 45.27 (SD=11.49). The remaining 76 responded until reaching the maximum test length of 60 items. CONCLUSIONS: The FAMCAT has the potential to satisfy the need for structured, frequent, and precise assessment of functional domains among hospitalized patients with medical diagnosis and/or surgical complications. The results are promising and may be informative for others who wish to develop similar instruments when concurrent assessment of correlated domains is required.


Subject(s)
Activities of Daily Living , Cognition , Bias , Computer Simulation , Humans , Psychometrics/methods , Surveys and Questionnaires
4.
Arch Phys Med Rehabil ; 103(5S): S34-S42.e4, 2022 05.
Article in English | MEDLINE | ID: mdl-34678294

ABSTRACT

OBJECTIVE: To (1) characterize the agreement between patient and proxy responses on a multidimensional computerized adaptive testing measure of function, and to (2) determine whether patient, proxy, or multidimensional computerized adaptive testing score characteristics identify when a proxy report can be used as a substitute for patient report in clinical decision making. DESIGN: A psychometric study of the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Testing (FAMCAT) and its 3 scales (Applied Cognition, Daily Activity, and Basic Mobility). SETTING: An upper midwestern quaternary academic medical center PARTICIPANTS: A total of 300 pairs of patients (average age 60.9 years; range, 19-89) hospitalized on general medical services or readmitted to surgical services for postoperative complications and their proxies (average age 60.5 years; range, 20-88). INTERVENTION: Not applicable. MAIN OUTCOME MEASURES: There were 3 outcomes: (1) agreement between patient and proxy scores on the FAMCAT domains, as well as age and sex, analyzed with univariate and multivariate analysis of variance (MANOVA); (2) associations of patient-proxy relationship and FAMCAT score characteristics with patient-proxy score agreement; and (3) presence of psychometrically significant intra-dyad differences in FAMCAT scores. RESULTS: The results of the MANOVA and follow-up ANOVAs indicated that there were no statistically significant differences in FAMCAT scale scores between patient and proxy estimates for either the Daily Activity or Basic Mobility scales. There were significant differences for the Applied Cognition scale (P<.005) between mean patient and proxy scores, with proxies rating patients as functioning at a higher level (mean=0.42) than patients did themselves (mean=0.00). However, psychometrically significant intra-dyadic Applied Cognition score differences occurred in only 14% of dyads, compared with 25% in the other 2 scales. Sex and age were associated with patient-proxy agreement, but the patterns were not sufficiently consistent to permit generalizations regarding the likely validity of a proxy's scores. CONCLUSIONS: Patient and proxy FAMCAT Daily Activity and Basic Mobility scores did not differ significantly, and proxy reporting offers a creditable surrogate for patient report on these domains. Low rates of psychometrically significant intra-dyadic score differences suggest that proxy report may serve as a low-resolution screen for functional deficits in all FAMCAT domains. Approximately half the proxies provided multi-domain profile ratings on the 3 scales that did not differ significantly from these of the associated patients, but more research is needed to identify situations in which proxy profiles could be used in place of those provided by patients.


Subject(s)
Proxy , Quality of Life , Activities of Daily Living , Humans , Middle Aged , Patients , Psychometrics
5.
Arch Phys Med Rehabil ; 103(5S): S59-S66.e3, 2022 05.
Article in English | MEDLINE | ID: mdl-34606758

ABSTRACT

OBJECTIVE: To determine whether a multidimensional computerized adaptive test, the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT), could be administered to hospitalized patients via a tablet computer rather than being orally administered by an interviewer. DESIGN: A randomized comparison of the responses of hospitalized patients to interviewer vs tablet delivery of the FAMCAT and its assessment of applied cognition, daily activity, and basic mobility. SETTING: Two quaternary teaching hospitals in the Upper Midwest. PARTICIPANTS: A total of 300 patients (127 men, 165 women), average age 61.2 (range, 18-97) hospitalized on medical services or rehospitalized on surgical services were randomly assigned to either a tablet (150) or an interview (150) group. INTERVENTION: Electronic tablet vs interview. MAIN OUTCOME MEASURES: Item response theory point estimates of the FAMCAT latent scales, their psychometric standard errors, number of items administered per domain, the determinant (an indicator of overall precision of the latent trait vector), as well as the time that patients required to complete their FAMCAT sessions. RESULTS: Of the 300 patients, 292 completed their assessments. The assessments of 4 individuals in each group was interrupted by clinical care and were not included in the analyses. A significant (P=.009) mode effect (ie, interview vs tablet) was identified when all outcome variables were considered simultaneously. However, the only outcome that was affected by the administration mode was test duration: tablet administration reduced the roughly 6-minute test time required by both approaches by only 20 seconds, which, though statistically significant, was clinically insignificant. CONCLUSIONS: The results of a FAMCAT assessment, at least for this cohort of hospitalized patients, are independent of administration via tablet computer or interview.


Subject(s)
Activities of Daily Living , Computers, Handheld , Cohort Studies , Female , Humans , Male , Middle Aged , Patient Reported Outcome Measures , Psychometrics
6.
Arch Phys Med Rehabil ; 103(5S): S43-S52, 2022 05.
Article in English | MEDLINE | ID: mdl-34606759

ABSTRACT

OBJECTIVE: To describe the adaptive measurement of change (AMC) as a means to identify psychometrically significant change in reported function of hospitalized patients and to reduce respondent burden on follow-up assessments. DESIGN: The AMC method uses multivariate computerized adaptive testing (CAT) and psychometric hypothesis tests based in item response theory to more efficiently measure intra-individual change using the responses of a single patient over 2 or more testing occasions. Illustrations of the utility of AMC in clinical care and estimates of AMC-based item reduction are provided using the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT), a newly developed functional multidimensional CAT-based measurement of basic mobility, daily activities, and applied cognition. SETTING: Two quaternary hospitals in the Upper Midwest. PARTICIPANTS: Four hundred ninety-five hospitalized patients who completed the FAMCAT on 2 to 4 occasions during their hospital stay. INTERVENTION: N/A. RESULTS: Of the 495 patients who completed more than 1 FAMCAT, 72% completed 2 sessions, 13% completed 3, and 15% completed 4, with 22.1%, 23.4%, and 23.0%, respectively, exhibiting significant multivariate change. Use of the AMC in conjunction with the FAMCAT reduced respondent burden from that of the FAMCAT alone for follow-up assessments. On average, when used without the AMC, 22.7 items (range, 20.4-24.4) were administered during FAMCAT sessions. Post hoc analyses determined that when the AMC was used with the FAMCAT a mean±standard deviation reduction in FAMCAT number of items of 13.6 (11.1), 13.1 (9.8), and 18.1 (10.8) would occur during the second, third, and fourth sessions, respectively, which corresponded to a reduction in test duration of 3.0 (2.4), 3.0 (2.8), and 4.7 (2.6) minutes. Analysis showed that the AMC requires no assumptions about the nature of change and provides data that are potentially actionable for patient care. Various patterns of significant univariate and multivariate change are illustrated. CONCLUSIONS: The AMC method is an effective and parsimonious approach to identifying significant change in patients' measured CAT scores. The AMC approach reduced FAMCAT sessions by an average of 12.6 items (55%) and 2.9 minutes (53%) among patients with psychometrically significant score changes.


Subject(s)
Health Services , Patient Reported Outcome Measures , Humans , Psychometrics , Research Design , Surveys and Questionnaires
7.
Arch Phys Med Rehabil ; 103(5S): S53-S58, 2022 05.
Article in English | MEDLINE | ID: mdl-34670134

ABSTRACT

OBJECTIVE: To characterize the ability of the patient-reported Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT) domains to predict discharge disposition when administered during acute care stays. DESIGN: Cohort study. Logistic regression models were estimated to identify the ability of FAMCAT domains to predict discharge to an institution for postacute care (PAC). SETTING: Academic medical center. PARTICIPANTS: Patients admitted to general medicine services from June 2016 to June 2019 (n = 4240). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURE(S): Discharge to an institution. RESULTS: In this sample, 10.5% of patients were discharged to an institution for rehabilitation versus home. FAMCAT domain scores were highly predictive of discharge to institutional PAC. Daily Activity and Basic Mobility domains had excellent discriminative ability for discharge to an institution (c-statistic, 0.83 and 0.87, respectively). In best fit models accounting for additional characteristics, discrimination was outstanding for Daily Activity (c-statistic, 0.91; 95% confidence interval, 0.89-0.94) and Basic Mobility (c-statistic 0.92; 95% confidence interval, 0.89-0.94). CONCLUSIONS: The FAMCAT Daily Activity and Basic Mobility domains demonstrated excellent discrimination for identifying patients who discharged to an institutional setting for rehabilitation and outstanding discrimination when adjusted for salient patient factors associated with discharge disposition. Estimates obtained in this investigation are comparable to the best discrimination achieved with clinician-rated measures to identify patients who would require institutional PAC.


Subject(s)
Patient Discharge , Subacute Care , Activities of Daily Living , Cohort Studies , Humans , Outcome Assessment, Health Care/methods , Retrospective Studies
8.
Arch Phys Med Rehabil ; 103(5S): S84-S107.e38, 2022 05.
Article in English | MEDLINE | ID: mdl-34146534

ABSTRACT

OBJECTIVE: To assess differential item functioning (DIF) in an item pool measuring the mobility of hospitalized patients across educational, age, and sex groups. DESIGN: Measurement evaluation cohort study. Content experts generated DIF hypotheses to guide the interpretation. The graded response item response theory (IRT) model was used. Primary DIF tests were Wald statistics; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude and impact were evaluated by examining group differences in expected item and scale score functions. SETTING: Hospital-based rehabilitation. PARTICIPANTS: Hospitalized patients (N=2216). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: A total of 111 self-reported mobility items. RESULTS: Two linking items among those used to set the metric across forms evidenced DIF for sex and age: "difficulty climbing stairs step-over-step without a handrail (alternating feet)" and "difficulty climbing 3-5 steps without a handrail." Conditional on the mobility state, the items were more difficult for women and older people (aged ≥65y). An additional 18 items were identified with DIF. Items with both high DIF magnitude and hypotheses related to age were difficulty "crossing road at a 4-lane traffic light with curbs," "jumping/landing on one leg," "strenuous activities," and "descending 3-5 steps with no handrail." Although DIF of higher magnitude was observed for several items, the scale-level effect was relatively small and the exposure rate for the most problematic items was low (0.35, 0.27, and 0.20). CONCLUSIONS: This was the first study to evaluate measurement equivalence of the hospital-based rehabilitation mobility item bank. Although 20 items evidenced high magnitude DIF, 5 of which were related to stairs, the scale-level effect was minimal; however, it is recommended that such items be avoided in the development of short-form measures. No items with salient DIF were removed from calibrations, supporting the use of the item bank across groups differing in education, age, and sex. The bank may thus be useful to assist clinical assessment and decision-making regarding risk for specific mobility restrictions at discharge as well as identifying mobility-related functions targeted for postdischarge interventions. Additionally, with the goal of avoiding long and burdensome assessments for patients and clinical staff, these results could be informative for those using the item bank to construct short forms.


Subject(s)
Aftercare , Patient Discharge , Aged , Cohort Studies , Female , Humans , Physical Therapy Modalities , Psychometrics/methods , Self Report , Surveys and Questionnaires
9.
Stud Hist Philos Sci ; 90: 10-14, 2021 12.
Article in English | MEDLINE | ID: mdl-34508955

ABSTRACT

We have each spent more than 50 years doing research that has had little impact. Even more lamentable is that our field, judgment and decision making (JDM), has on the whole had little impact during that span. We attribute that failure to the use of methodologies that emphasize testing models rather than looking for differences in behavior. The "cognitive revolution" led the field astray, toward the goal of studying model fit rather than comparing observable results. With modeling as the goal, experimentation was stultified. Simple tasks became dominant. Although a poor metaphor for real decision making, the gambling paradigm has lasted forever because the inputs to the decision are known to the researcher and thus easily modeled.


Subject(s)
Decision Making , Gambling , Gambling/psychology , Humans , Judgment , Medical Futility , Motivation
10.
Psychometrika ; 86(3): 674-711, 2021 09.
Article in English | MEDLINE | ID: mdl-34251615

ABSTRACT

Several methods used to examine differential item functioning (DIF) in Patient-Reported Outcomes Measurement Information System (PROMIS®) measures are presented, including effect size estimation. A summary of factors that may affect DIF detection and challenges encountered in PROMIS DIF analyses, e.g., anchor item selection, is provided. An issue in PROMIS was the potential for inadequately modeled multidimensionality to result in false DIF detection. Section 1 is a presentation of the unidimensional models used by most PROMIS investigators for DIF detection, as well as their multidimensional expansions. Section 2 is an illustration that builds on previous unidimensional analyses of depression and anxiety short-forms to examine DIF detection using a multidimensional item response theory (MIRT) model. The Item Response Theory-Log-likelihood Ratio Test (IRT-LRT) method was used for a real data illustration with gender as the grouping variable. The IRT-LRT DIF detection method is a flexible approach to handle group differences in trait distributions, known as impact in the DIF literature, and was studied with both real data and in simulations to compare the performance of the IRT-LRT method within the unidimensional IRT (UIRT) and MIRT contexts. Additionally, different effect size measures were compared for the data presented in Section 2. A finding from the real data illustration was that using the IRT-LRT method within a MIRT context resulted in more flagged items as compared to using the IRT-LRT method within a UIRT context. The simulations provided some evidence that while unidimensional and multidimensional approaches were similar in terms of Type I error rates, power for DIF detection was greater for the multidimensional approach. Effect size measures presented in Section 1 and applied in Section 2 varied in terms of estimation methods, choice of density function, methods of equating, and anchor item selection. Despite these differences, there was considerable consistency in results, especially for the items showing the largest values. Future work is needed to examine DIF detection in the context of polytomous, multidimensional data. PROMIS standards included incorporation of effect size measures in determining salient DIF. Integrated methods for examining effect size measures in the context of IRT-based DIF detection procedures are still in early stages of development.


Subject(s)
Anxiety , Patient Reported Outcome Measures , Humans , Information Systems , Psychometrics
11.
Arch Rehabil Res Clin Transl ; 3(2): 100112, 2021 Jun.
Article in English | MEDLINE | ID: mdl-34179750

ABSTRACT

OBJECTIVE: To (1) develop a patient-reported, multidomain functional assessment tool focused on medically ill patients in acute care settings; (2) characterize the measure's psychometric performance; and (3) establish clinically actionable score strata that link to easily implemented mobility preservation plans. DESIGN: This article describes the approach that our team pursued to develop and characterize this tool, the Functional Assessment in Acute Care Multidimensional Computer Adaptive Test (FAMCAT). Development involved a multistep process that included (1) expanding and refining existing item banks to optimize their salience for hospitalized patients; (2) administering candidate items to a calibration cohort; (3) estimating multidimensional item response theory models; (4) calibrating the item banks; (5) evaluating potential multidimensional computerized adaptive testing (MCAT) enhancements; (6) parameterizing the MCAT; (7) administering it to patients in a validation cohort; and (8) estimating its predictive and psychometric characteristics. SETTING: A large (2000-bed) Midwestern Medical Center. PARTICIPANTS: The overall sample included 4495 adults (2341 in a calibration cohort, 2154 in a validation cohort) who were admitted either to medical services with at least 1 chronic condition or to surgical/medical services if they required readmission after a hospitalization for surgery (N=4495). INTERVENTION: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The FAMCAT is an instrument designed to permit the efficient, precise, low-burden, multidomain functional assessment of hospitalized patients. We tried to optimize the FAMCAT's efficiency and precision, as well as its ability to perform multiple assessments during a hospital stay, by applying cutting edge methods such as the adaptive measure of change (AMC), differential item functioning computerized adaptive testing, and integration of collateral test-taking information, particularly item response times. Evaluation of these candidate methods suggested that all may enhance MCAT performance, but none were integrated into initial MCAT parameterization. CONCLUSIONS: The FAMCAT has the potential to address a longstanding need for structured, frequent, and accurate functional assessment among patients hospitalized with medical diagnoses and complications of surgery.

12.
Public Health Genomics ; 24(5-6): 291-303, 2021.
Article in English | MEDLINE | ID: mdl-34058740

ABSTRACT

BACKGROUND: Genomic testing is increasingly employed in clinical, research, educational, and commercial contexts. Genomic literacy is a prerequisite for the effective application of genomic testing, creating a corresponding need for validated tools to assess genomics knowledge. We sought to develop a reliable measure of genomics knowledge that incorporates modern genomic technologies and is informative for individuals with diverse backgrounds, including those with clinical/life sciences training. METHODS: We developed the GKnowM Genomics Knowledge Scale to assess the knowledge needed to make an informed decision for genomic testing, appropriately apply genomic technologies and participate in civic decision-making. We administered the 30-item draft measure to a calibration cohort (n = 1,234) and subsequent participants to create a combined validation cohort (n = 2,405). We performed a multistage psychometric calibration and validation using classical test theory and item response theory (IRT) and conducted a post-hoc simulation study to evaluate the suitability of a computerized adaptive testing (CAT) implementation. RESULTS: Based on exploratory factor analysis, we removed 4 of the 30 draft items. The resulting 26-item GKnowM measure has a single dominant factor. The scale internal consistency is α = 0.85, and the IRT 3-PL model demonstrated good overall and item fit. Validity is demonstrated with significant correlation (r = 0.61) with an existing genomics knowledge measure and significantly higher scores for individuals with adequate health literacy and healthcare providers (HCPs), including HCPs who work with genomic testing. The item bank is well suited to CAT, achieving high accuracy (r = 0.97 with the full measure) while administering a mean of 13.5 items. CONCLUSION: GKnowM is an updated, broadly relevant, rigorously validated 26-item measure for assessing genomics knowledge that we anticipate will be useful for assessing population genomic literacy and evaluating the effectiveness of genomics educational interventions.


Subject(s)
Health Literacy , Factor Analysis, Statistical , Genomics , Humans , Psychometrics/methods , Reproducibility of Results , Surveys and Questionnaires
13.
Educ Psychol Meas ; 81(3): 491-522, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33994561

ABSTRACT

S - χ 2 is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of S - χ 2 for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of S - χ 2 under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using S - χ 2 within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. S - χ 2 performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of S - χ 2 were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of S - χ 2 was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.

14.
Multivariate Behav Res ; 56(5): 703-723, 2021.
Article in English | MEDLINE | ID: mdl-32598188

ABSTRACT

Normality of latent traits is a common assumption made when estimating parameters for item response theory (IRT) models, but this assumption may be violated. The purpose of this research was to present a new Markov chain Monte Carlo (MCMC) method for ordinal items with flexible latent trait distributions (i.e., skewed and bimodal). Specifically, the Davidian curve (DC) was used to approximate the distribution of latent traits. The performance of the proposed MCMC algorithm with DCs was evaluated via a simulation study and compared with an EM method using DCs that is available in the "mirt" package (Chalmers, 2012). The manipulated factors included the number of response categories, sample size, and the shape of the latent trait distribution. The Hanna-Quinn (HQ) criterion was used to choose the best DC order. Results indicated that when informative priors were used, the MCMC algorithm with DCs could fit a flexible distribution well and the method provided good parameter estimates which, under some circumstances, had lower bias and RMSE than the EM method.


Subject(s)
Algorithms , Bayes Theorem , Computer Simulation , Markov Chains , Monte Carlo Method
15.
Multivariate Behav Res ; 56(3): 459-475, 2021.
Article in English | MEDLINE | ID: mdl-32124648

ABSTRACT

In psychological and educational measurement, it is often of interest to assess change in an individual. The current study expanded on previous research by introducing methods that can evaluate individual change on multiple latent traits measured on multiple occasions. The four methods considered are the likelihood ratio test (LRT), the multivariate Wald test (MWT), the modified multivariate Wald test (MMWT), and the score test (ST). Simulation studies were conducted to examine the true positive rate (TPR) and the false positive rate (FPR) of the new methods under a conventional fixed-form test and a computerized adaptive test (CAT). Manipulated variables included the number of occasions, change magnitudes, patterns of change, and correlations between latent traits. Results revealed that, in terms of FPR, all methods except MWT had close adherence to the nominal significance level. Among the three methods, the LRT is recommended as it provided a balance between FPR and TPR. Larger change magnitude yielded higher TPR, regardless of the remaining factors. With the same test length, a CAT yielded higher TPR than a conventional test. Real-data examples are provided of identifying psychometrically significant change across two to four occasions using a multivariate adaptive self-report medical outcomes measure from hospitalized patients. The detection of significant change among the three methods agreed highly, and those patients identified as having significant change exhibited large profile differences, which provided support for the valid performance of the proposed methods.


Subject(s)
Educational Measurement , Research Design , Computer Simulation , Humans
16.
J Eval Clin Pract ; 26(5): 1347-1351, 2020 10.
Article in English | MEDLINE | ID: mdl-32794332

ABSTRACT

RATIONALE, AIMS AND OBJECTIVES: In the United States, the reluctance of the federal government to impose a national stay-at-home policy in wake of COVID-19 pandemic has left the decision of how to achieve social distancing to individual state governors. We hypothesized that in the absence of formal guidelines, the decision to close a state reflects the classic Weber-Fechner law of psychophysics - the amount by which a stimulus (such as number of cases or deaths) must increase in order to be noticed as a fraction of the intensity of that stimulus. METHODS: On 12 April 2020, we downloaded data from the New York Times database from all 50 states and the District of Columbia; by that time all but 7 states had issued the stay-at-home orders. We fitted the Weber-Fechner logarithmic function by regressing the log2 of cases and deaths, respectively, against the daily counts. We also conducted Cox regression analysis to determine if the probability of issuing the stay-at-home order increases proportionally as the number of cases or deaths increases. RESULTS: We found that the decision to issue the state-at-home order reflects the Weber-Fechner law. Both the number of infections (P = <.0001; R2 = .79) and deaths (P < .0001; R2 = .63) were significantly associated with the decision to issue the stay-at-home orders. The results indicate that for each doubling of infections or deaths, an additional four to six states will issue stay-at-home orders. Cox regression showed that when the number of deaths reached 256 and the number of infected people were over 16 000 the probability of issuing "stay-at-home" order was close to 100%. We found no difference in decision-making according to the political affiliation; the results remain unchanged on 16 July 2 020. CONCLUSIONS: when there are not clearly articulated rules to follow, decision-makers resort to simple heuristics, in this case one consistent with the Weber-Fechner law.


Subject(s)
Coronavirus Infections/epidemiology , Pneumonia, Viral/epidemiology , State Government , Betacoronavirus , COVID-19 , Coronavirus Infections/mortality , Decision Making , Humans , Models, Statistical , Pandemics , Pneumonia, Viral/mortality , Quinazolines , SARS-CoV-2 , United States/epidemiology
17.
Front Psychol ; 10: 51, 2019.
Article in English | MEDLINE | ID: mdl-30761036

ABSTRACT

This study explored calibrating a large item bank for use in multidimensional health measurement with computerized adaptive testing, using both item responses and response time (RT) information. The Activity Measure for Post-Acute Care is a patient-reported outcomes measure comprised of three correlated scales (Applied Cognition, Daily Activities, and Mobility). All items from each scale are Likert type, so that a respondent chooses a response from an ordered set of four response options. The most appropriate item response theory model for analyzing and scoring these items is the multidimensional graded response model (MGRM). During the field testing of the items, an interviewer read each item to a patient and recorded, on a tablet computer, the patient's responses and the software recorded RTs. Due to the large item bank with over 300 items, data collection was conducted in four batches with a common set of anchor items to link the scale. van der Linden's (2007) hierarchical modeling framework was adopted. Several models, with or without interviewer as a covariate and with or without interaction between interviewer and items, were compared for each batch of data. It was found that the model with the interaction between interviewer and item, when the interaction effect was constrained to be proportional, fit the data best. Therefore, the final hierarchical model with a lognormal model for RT and the MGRM for response data was fitted to all batches of data via a concurrent calibration. Evaluation of parameter estimates revealed that (1) adding response time information did not affect the item parameter estimates and their standard errors significantly; (2) adding response time information helped reduce the standard error of patients' multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. Implications of the findings for follow up adaptive test delivery design are discussed.

18.
Psychometrika ; 84(3): 749-771, 2019 09.
Article in English | MEDLINE | ID: mdl-30511327

ABSTRACT

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an "absolute change in theta" (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1-17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.


Subject(s)
Cognition/physiology , Computer Simulation/statistics & numerical data , Psychometrics/methods , Algorithms , Bias , Dimensional Measurement Accuracy , Humans
19.
Appl Psychol Meas ; 42(3): 221-239, 2018 May.
Article in English | MEDLINE | ID: mdl-29881123

ABSTRACT

The measurement of individual change has been an important topic in both education and psychology. For instance, teachers are interested in whether students have significantly improved (e.g., learned) from instruction, and counselors are interested in whether particular behaviors have been significantly changed after certain interventions. Although classical test methods have been unable to adequately resolve the problems in measuring change, recent approaches for measuring change have begun to use item response theory (IRT). However, all prior methods mainly focus on testing whether growth is significant at the group level. The present research targets a key research question: Is the "change" in latent trait estimates for each individual significant across occasions? Many researchers have addressed this research question assuming that the latent trait is unidimensional. This research generalizes their earlier work and proposes four hypothesis testing methods to evaluate individual change on multiple latent traits: a multivariate Z-test, a multivariate likelihood ratio test, a multivariate score test, and a Kullback-Leibler test. Simulation results show that these tests hold promise of detecting individual change with low Type I error and high power. A real-data example from an educational assessment illustrates the application of the proposed methods.

20.
Multivariate Behav Res ; 53(3): 403-418, 2018.
Article in English | MEDLINE | ID: mdl-29624093

ABSTRACT

A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.


Subject(s)
Models, Statistical , Multivariate Analysis , Algorithms , Computer Simulation , Data Interpretation, Statistical , Factor Analysis, Statistical , Least-Squares Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...