Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Psychometrika ; 88(4): 1228-1248, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37752345

RESUMO

Categorical marginal models (CMMs) are flexible tools for modelling dependent or clustered categorical data, when the dependencies themselves are not of interest. A major limitation of maximum likelihood (ML) estimation of CMMs is that the size of the contingency table increases exponentially with the number of variables, so even for a moderate number of variables, say between 10 and 20, ML estimation can become computationally infeasible. An alternative method, which retains the optimal asymptotic efficiency of ML, is maximum empirical likelihood (MEL) estimation. However, we show that MEL tends to break down for large, sparse contingency tables. As a solution, we propose a new method, which we call maximum augmented empirical likelihood (MAEL) estimation and which involves augmentation of the empirical likelihood support with a number of well-chosen cells. Simulation results show good finite sample performance for very large contingency tables.


Assuntos
Funções Verossimilhança , Psicometria , Simulação por Computador
2.
PLoS One ; 18(8): e0289337, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37535634

RESUMO

BACKGROUND: The health action process approach (HAPA) model is promising to increase the frequency of brushing children's teeth by parents to improve their children's oral health. A validated HAPA questionnaire is needed as one of the measures of the effects of such an intervention. OBJECTIVES: The aim of this study was to evaluate whether our data, based on a translated and adopted version of the Health Action Process Approach (HAPA)-based questionnaire on dental flossing, supported the constructs of the HAPA model. If so, a next aim was to assess whether these constructs could be measured reliably. METHODS: In this cross-sectional study, 269 questionnaires filled out in dental offices by parents of children 1-10 years old were analysed. Scale validation was performed according to the 6-step protocol of Dima, including Mokken scale analyses (MSA), graded response model (GRM), factor analyses and reliability measures. Pearson correlation coefficients were calculated to identify divergent validity and test-retest reliability. RESULTS: MSA showed a unidimensional, medium total scale. Three items were removed based on this analysis. The total scale with the remaining 26 items did not fit the GRM. Factor analysis extracted five factors and two components for the total scale. The separate subscales, except the 'intention' construct, fitted the MSA and did not fit the GRM. The data fitted a seven-factor model better than a one-factor model. Reliability measures varied from acceptable to excellent, but were poor for 'action control'. Test-retest reliability (r's 0.60-0.83) was questionable to good. CONCLUSION: Our results did not fully support the constructs of the HAPA model. To support the HAPA constructs, modification to the subscales risk perceptions, intention, action planning, action control and self-reported behaviour are suggested. With these adjustments, the reliability and validity of the questionnaire could be significantly improved".


Assuntos
Cognição , Pais , Humanos , Criança , Lactente , Pré-Escolar , Países Baixos , Reprodutibilidade dos Testes , Estudos Transversais , Pais/psicologia , Inquéritos e Questionários
3.
Psychol Methods ; 2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36048052

RESUMO

Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

4.
Qual Life Res ; 31(1): 1-9, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34751897

RESUMO

We introduce the special section on nonparametric item response theory (IRT) in Quality of Life Research. Starting from the well-known Rasch model, we provide a brief overview of nonparametric IRT models and discuss the assumptions, the properties, and the investigation of goodness of fit. We provide references to more detailed texts to help readers getting acquainted with nonparametric IRT models. In addition, we show how the rather diverse papers in the special section fit into the nonparametric IRT framework. Finally, we illustrate the application of nonparametric IRT models using data from a questionnaire measuring activity limitations in walking. The real-data example shows the quality of the scale and its constituent items with respect to dimensionality, local independence, monotonicity, and invariant item ordering.


Assuntos
Qualidade de Vida , Humanos , Psicometria , Qualidade de Vida/psicologia , Inquéritos e Questionários
5.
Psychol Methods ; 27(4): 650-666, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33818118

RESUMO

Current interrater reliability (IRR) coefficients ignore the nested structure of multilevel observational data, resulting in biased estimates of both subject- and cluster-level IRR. We used generalizability theory to provide a conceptualization and estimation method for IRR of continuous multilevel observational data. We explain how generalizability theory decomposes the variance of multilevel observational data into subject-, cluster-, and rater-related components, which can be estimated using Markov chain Monte Carlo (MCMC) estimation. We explain how IRR coefficients for each level can be derived from these variance components, and how they can be estimated as intraclass correlation coefficients (ICC). We assessed the quality of MCMC point and interval estimates with a simulation study, and showed that small numbers of raters were the main source of bias and inefficiency of the ICCs. In a follow-up simulation, we showed that a planned missing data design can diminish most estimation difficulties in these conditions, yielding a useful approach to estimating multilevel interrater reliability for most social and behavioral research. We illustrated the method using data on student-teacher relationships. All software code and data used for this article is available on the Open Science Framework: https://osf.io/bwk5t/. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Assuntos
Pesquisa Comportamental , Projetos de Pesquisa , Viés , Humanos , Método de Monte Carlo , Reprodutibilidade dos Testes
6.
Qual Life Res ; 31(1): 25-36, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33983619

RESUMO

PURPOSE: Mokken scale analysis (MSA) is an attractive scaling procedure for ordinal data. MSA is frequently used in health-related quality of life research. Two of MSA's prime features are the scalability coefficients and the automated item selection procedure (AISP). The AISP partitions a (large) set of items into scales based on the observed item scores; the resulting scales can be used as measurement instruments. There exist two issues in MSA: First, point estimates, standard errors, and test statistics for scalability coefficients are inappropriate for clustered item scores, which are omnipresent in quality of life research data. Second, the AISP insufficiently takes sampling fluctuation of Mokken's scalability coefficients into account. METHODS: We solved both issues by providing point estimates and standard errors for the scalability coefficients for clustered data and by implementing a Wald-based significance test in the AISP algorithm, resulting in a test-guided AISP (T-AISP), that is available for both nonclustered and clustered test scores. RESULTS: We integrated the T-AISP into a two-step, test-guided MSA for scale construction, to guide the analysis for nonclustered and clustered data. The first step is performing a T-AISP and select the final scale(s). For clustered data, within-group dependency is investigated on the final scale(s). In the second step, the strength of the scale(s) is determined and further analyses are performed. The procedure was demonstrated on clustered item scores obtained from administering a questionnaire on quality of life in schools to 639 students nested in 30 classrooms. CONCLUSIONS: We developed a two-step, test-guided MSA for scale construction that takes into account sample fluctuation of all scalability coefficients and that can be applied to item scores obtained by a nonclustered or clustered sampling design.


Assuntos
Qualidade de Vida , Projetos de Pesquisa , Algoritmos , Humanos , Psicometria , Qualidade de Vida/psicologia , Reprodutibilidade dos Testes , Inquéritos e Questionários
7.
Appl Psychol Meas ; 44(3): 197-214, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32341607

RESUMO

Two-level Mokken scale analysis is a generalization of Mokken scale analysis for multi-rater data. The bias of estimated scalability coefficients for two-level Mokken scale analysis, the bias of their estimated standard errors, and the coverage of the confidence intervals has been investigated, under various testing conditions. It was found that the estimated scalability coefficients were unbiased in all tested conditions. For estimating standard errors, the delta method and the cluster bootstrap were compared. The cluster bootstrap structurally underestimated the standard errors of the scalability coefficients, with low coverage values. Except for unequal numbers of raters across subjects and small sets of items, the delta method standard error estimates had negligible bias and good coverage. Post hoc simulations showed that the cluster bootstrap does not correctly reproduce the sampling distribution of the scalability coefficients, and an adapted procedure was suggested. In addition, the delta method standard errors can be slightly improved if the harmonic mean is used for unequal numbers of raters per subject rather than the arithmetic mean.

8.
Br J Math Stat Psychol ; 73(2): 213-236, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31231795

RESUMO

For the construction of tests and questionnaires that require multiple raters (e.g., a child behaviour checklist completed by both parents) a novel ordinal scaling technique is currently being further developed, called two-level Mokken scale analysis. The technique uses within-rater and between-rater coefficients to assess the scalability of the test. These coefficients are generalizations of Mokken's scalability coefficients. In this paper we derived standard errors for the two-level coefficients and for their ratios. The coefficients, the estimates, the estimated standard errors and the software implementation are discussed and illustrated using a real-data example, and a small-scale simulation study demonstrates the accuracy of the estimates.


Assuntos
Modelos Estatísticos , Psicometria/métodos , Criança , Comportamento Infantil , Simulação por Computador , Humanos , Probabilidade , Software , Estatísticas não Paramétricas , Inquéritos e Questionários/estatística & dados numéricos
9.
Assessment ; 27(1): 178-193, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-28703008

RESUMO

Respondents may use satisficing (i.e., nonoptimal) strategies when responding to self-report questionnaires. These satisficing strategies become more likely with decreasing motivation and/or cognitive ability (Krosnick, 1991). Considering that cognitive deficits are characteristic of depressive and anxiety disorders, depressed and anxious patients may be prone to satisficing. Using data from the Netherland's Study of Depression and Anxiety (N = 2,945), we studied the relationship between depression and anxiety, cognitive symptoms, and satisficing strategies on the NEO Five-Factor Inventory. Results showed that respondents with either an anxiety disorder or a comorbid anxiety and depression disorder used satisficing strategies substantially more often than healthy respondents. Cognitive symptom severity partly mediated the effect of anxiety disorder and comorbid anxiety disorder on satisficing. The results suggest that depressed and anxious patients produce relatively low-quality self-report data-partly due to cognitive symptoms. Future research should investigate the degree of satisficing across different mental health care assessment contexts.


Assuntos
Ansiedade/psicologia , Transtornos Cognitivos/psicologia , Depressão/psicologia , Escalas de Graduação Psiquiátrica/estatística & dados numéricos , Autorrelato/estatística & dados numéricos , Adolescente , Adulto , Idoso , Cognição , Confiabilidade dos Dados , Feminino , Humanos , Masculino , Saúde Mental , Pessoa de Meia-Idade , Países Baixos , Adulto Jovem
10.
Muscle Nerve ; 60(5): 520-527, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31281987

RESUMO

INTRODUCTION: Loss of sensation due to diabetes-related neuropathy often leads to diabetic foot ulceration. Several test instruments are used to assess sensation, such as static and moving 2-point discrimination (S2PD, M2PD), monofilaments, and tuning forks. METHODS: Mokken scale analysis was applied to the Rotterdam Diabetic Foot Study data to select hierarchies of tests to construct measurement scales. RESULTS: We developed 39-item and 31-item scales to measure loss of sensation for research purposes and a 13-item scale for clinical practice. All instruments were strongly scalable and reliable. The 39 items can be classified into 5 hierarchically ordered core clusters: S2PD, M2PD, vibration sense, monofilaments, and prior ulcer or amputation. DISCUSSION: Guided by the presented scales, clinicians may better classify the grade of sensory loss in diabetic patients' feet. Thus, a more personalized approach concerning individual recommendations, intervention strategies, and patient information may be applied.


Assuntos
Pé Diabético/diagnóstico , Limiar Sensorial , Adulto , Idoso , Estudos de Casos e Controles , Estudos de Coortes , Pé Diabético/fisiopatologia , Neuropatias Diabéticas/diagnóstico , Neuropatias Diabéticas/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Países Baixos , Estudos Prospectivos , Índice de Gravidade de Doença , Vibração
11.
Assessment ; 26(7): 1207-1216, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-29084436

RESUMO

Test authors report sample reliability values but rarely consider the sampling error and related confidence intervals. This study investigated the truth of this conjecture for 116 tests with 1,024 reliability estimates (105 pertaining to test batteries and 919 to tests measuring a single attribute) obtained from an online database. Based on 90% confidence intervals, approximately 20% of the initial quality assessments had to be downgraded. For 95% confidence intervals, the percentage was approximately 23%. The results demonstrated that reported reliability values cannot be trusted without considering their estimation precision.


Assuntos
Intervalos de Confiança , Testes Psicológicos/normas , Reprodutibilidade dos Testes , Bélgica , Bases de Dados Factuais , Humanos , Países Baixos
12.
Educ Psychol Meas ; 78(6): 998-1020, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30542214

RESUMO

Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method λ 6 , and method CA. The item-score reliability methods are compared with four well-known and widely accepted item indices, which are the item-rest correlation, the item-factor loading, the item scalability, and the item discrimination. Realistic values for item-score reliability in empirical-data sets are monitored to obtain an impression of the values to be expected in other empirical-data sets. The relation between the three item-score reliability methods and the four well-known item indices are investigated. Tentatively, a minimum value for the item-score reliability methods to be used in item analysis is recommended.

13.
Appl Psychol Meas ; 42(7): 553-570, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30237646

RESUMO

Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item's contribution to the test score's reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar-Sijtsma method (method MS), Guttman's method λ6 , the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). A simulation study was used to compare the methods with respect to median bias, variability (interquartile range [IQR]), and percentage of outliers. The simulation study consisted of six conditions: standard, polytomous items, unequal α parameters, two-dimensional data, long test, and small sample size. Methods MS and CA were the most accurate. Method LCRC showed almost unbiased results, but large variability. Method λ6 consistently underestimated item-score reliabilty, but showed a smaller IQR than the other methods.

14.
Front Psychol ; 9: 2298, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30687144

RESUMO

This study investigates the usefulness of item-score reliability as a criterion for item selection in test construction. Methods MS, λ6, and CA were investigated as item-assessment methods in item selection and compared to the corrected item-total correlation, which was used as a benchmark. An ideal ordering to add items to the test (bottom-up procedure) or omit items from the test (top-down procedure) was defined based on the population test-score reliability. The orderings the four item-assessment methods produced in samples were compared to the ideal ordering, and the degree of resemblance was expressed by means of Kendall's τ. To investigate the concordance of the orderings across 1,000 replicated samples, Kendall's W was computed for each item-assessment method. The results showed that for both the bottom-up and the top-down procedures, item-assessment method CA and the corrected item-total correlation most closely resembled the ideal ordering. Generally, all item assessment methods resembled the ideal ordering better, and concordance of the orderings was greater, for larger sample sizes, and greater variance of the item discrimination parameters.

15.
Qual Life Res ; 27(7): 1673-1682, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29098607

RESUMO

BACKGROUND: Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. OBJECTIVE: To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? ILLUSTRATIVE EXAMPLE: An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. DISCUSSION: The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.


Assuntos
Medidas de Resultados Relatados pelo Paciente , Inquéritos e Questionários , Feminino , Humanos , Masculino , Psicometria , Qualidade de Vida , Reprodutibilidade dos Testes
16.
Eat Disord ; 26(3): 263-269, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29125797

RESUMO

In a sample of 38 eating disorder (ED) patients who received psychotherapeutic treatment, changes in attachment security, and mentalization in relation to symptoms reduction were investigated. Attachment security improved in 1 year but was unrelated to improvement of ED or comorbid symptoms. Mentalization did not change significantly in 1 year. Pretreatment mentalization was negatively related to the severity of ED symptoms, trait anxiety, psycho-neuroticism, and self-injurious behavior after 1 year of treatment. We conclude that for ED patients, improving mentalization might increase the effect of treatment on core and comorbid symptoms.


Assuntos
Transtornos da Alimentação e da Ingestão de Alimentos/terapia , Apego ao Objeto , Teoria da Mente/fisiologia , Ansiedade/psicologia , Comorbidade , Transtornos da Alimentação e da Ingestão de Alimentos/psicologia , Feminino , Humanos , Comportamento Autodestrutivo , Fatores de Tempo , Resultado do Tratamento , Adulto Jovem
18.
Eat Weight Disord ; 22(3): 535-547, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28643289

RESUMO

PURPOSE: To investigate whether recovery from an eating disorder is related to pre-treatment attachment and mentalization and/or to improvement of attachment and mentalization during treatment. METHOD: For a sample of 38 anorexia nervosa (AN) and bulimia nervosa (BN) patients receiving treatment the relations between attachment security, mentalization, comorbidity and recovery status after 12 months (not recovered or recovered), and after 18 months (persistently ill, relapsed, newly recovered, or persistently recovered) were investigated. Attachment security and mentalization were assessed by the Adult Attachment Interview at the start of the treatment and after 12 months. Besides assessing co-morbidity-for its effect on treatment outcome-we measured psycho-neuroticism and autonomy because of their established relations to both eating disorder symptoms and to attachment security. RESULTS: Recovery both at 12 months and at 18 months was related to higher levels of mentalization; for attachment, no significant differences were found between recovered and unrecovered patients. Patients who recovered from AN or BN also improved on co-morbid symptoms: whereas pre-treatment symptom severity was similar, at 12 months recovered patients scored lower on co-morbid personality disorders, anxiety, depression, self-injurious behaviour and psycho-neuroticism than unrecovered patients. Improvement on autonomy (reduced sensitivity to others; greater capacity to manage new situations) in 1 year of treatment was significantly higher in recovered than in unrecovered patients. CONCLUSION: A focus on enhancing mentalization in eating disorder treatment might be useful to increase the chances of successful treatment. Improvement of autonomy might be the mechanism of change in recovering from AN or BN. LEVEL OF EVIDENCE: Level III cohort study.


Assuntos
Transtornos da Alimentação e da Ingestão de Alimentos/terapia , Apego ao Objeto , Teoria da Mente/fisiologia , Adulto , Ansiedade/psicologia , Depressão/psicologia , Transtornos da Alimentação e da Ingestão de Alimentos/psicologia , Feminino , Humanos , Testes Neuropsicológicos , Resultado do Tratamento , Adulto Jovem
19.
Br J Math Stat Psychol ; 70(1): 137-158, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27958642

RESUMO

Over the past decade, Mokken scale analysis (MSA) has rapidly grown in popularity among researchers from many different research areas. This tutorial provides researchers with a set of techniques and a procedure for their application, such that the construction of scales that have superior measurement properties is further optimized, taking full advantage of the properties of MSA. First, we define the conceptual context of MSA, discuss the two item response theory (IRT) models that constitute the basis of MSA, and discuss how these models differ from other IRT models. Second, we discuss dos and don'ts for MSA; the don'ts include misunderstandings we have frequently encountered with researchers in our three decades of experience with real-data MSA. Third, we discuss a methodology for MSA on real data that consist of a sample of persons who have provided scores on a set of items that, depending on the composition of the item set, constitute the basis for one or more scales, and we use the methodology to analyse an example real-data set.


Assuntos
Interpretação Estatística de Dados , Avaliação Educacional/métodos , Modelos Estatísticos , Avaliação de Resultados em Cuidados de Saúde/métodos , Psicometria/métodos , Inquéritos e Questionários , Algoritmos , Simulação por Computador
20.
Psychometrika ; 2016 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-27844269

RESUMO

Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...