Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Entropy (Basel) ; 25(3)2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36981350

RESUMO

Dempster-Shafer evidence theory is widely used to deal with uncertain information by evidence modeling and evidence reasoning. However, if there is a high contradiction between different pieces of evidence, the Dempster combination rule may give a fusion result that violates the intuitive result. Many methods have been proposed to solve conflict evidence fusion, and it is still an open issue. This paper proposes a new reliability coefficient using betting commitment evidence distance in Dempster-Shafer evidence theory for conflict and uncertain information fusion. The single belief function for belief assignment in the initial frame of discernment is defined. After evidence preprocessing with the proposed reliability coefficient and single belief function, the evidence fusion result can be calculated with the Dempster combination rule. To evaluate the effectiveness of the proposed uncertainty measure, a new method of uncertain information fusion based on the new evidence reliability coefficient is proposed. The experimental results on UCI machine learning data sets show the availability and effectiveness of the new reliability coefficient for uncertain information processing.

2.
Front Psychol ; 13: 1074430, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36619096

RESUMO

Critiques of coefficient alpha as an estimate of scale reliability are widespread in the literature. However, the continuous overuse of this statistic in mathematics education research suggests a disconnection between theory and practice. As such, this article argues, in a non-technical way, for the limited usefulness of coefficient alpha, its overuse, and its alternatives in estimating scale reliability. Coefficient alpha gives information only about the degree of the interrelatedness of a set of items that measures a construct. Contrary to the widely circulated misconceptions in mathematics education research, a high coefficient alpha value does not mean the instrument is reliable, and it does not imply the instrument measures a single construct. Coefficient alpha can only be dependable as an estimate of reliability under verifiable and restrictive conditions. I expose these conditions and present steps for their verification in empirical studies. I discuss some alternatives to coefficient alpha with references to non-technical articles where worked examples and programming codes are available. I hope this exposition will influence the practices of mathematics education researchers regarding estimation of scale reliability.

3.
Educ Psychol Meas ; 81(4): 791-810, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34267401

RESUMO

The population discrepancy between unstandardized and standardized reliability of homogeneous multicomponent measuring instruments is examined. Within a latent variable modeling framework, it is shown that the standardized reliability coefficient for unidimensional scales can be markedly higher than the corresponding unstandardized reliability coefficient, or alternatively substantially lower than the latter. Based on these findings, it is recommended that scholars avoid estimating, reporting, interpreting, or using standardized scale reliability coefficients in empirical research, unless they have strong reasons to consider standardizing the original components of utilized scales.

4.
Res Synth Methods ; 12(4): 516-536, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33742752

RESUMO

Reliability generalization (RG) is a meta-analytic approach that aims to characterize how reliability estimates from the same test vary across different applications of the instrument. With this purpose RG meta-analyses typically focus on a particular test and intend to obtain an overall reliability of test scores and to investigate how the composition and variability of the samples affect reliability. Although several guidelines have been proposed in the meta-analytic literature to help authors improve the reporting quality of meta-analyses, none of them were devised for RG meta-analyses. The purpose of this investigation was to develop REGEMA (REliability GEneralization Meta-Analysis), a 30-item checklist (plus a flow chart) adapted to the specific issues that the reporting of an RG meta-analysis must take into account. Based on previous checklists and guidelines proposed in the meta-analytic arena, a first version was elaborated by applying the nominal group methodology. The resulting instrument was submitted to a list of independent meta-analysis experts and, after discussion, the final version of the REGEMA checklist was reached. In a pilot study, four pairs of coders applied REGEMA to a random sample of 40 RG meta-analyses in Psychology, and results showed satisfactory inter-coder reliability. REGEMA can be used by: (a) meta-analysts conducting or reporting an RG meta-analysis and aiming to improve its reporting quality; (b) consumers of RG meta-analyses who want to make informed critical appraisals of their reporting quality, and (c) reviewers and editors of journals who are considering submissions where an RG meta-analysis was reported for potential publication.


Assuntos
Lista de Checagem , Relatório de Pesquisa , Projetos Piloto , Reprodutibilidade dos Testes , Projetos de Pesquisa
5.
Stat Med ; 40(4): 1034-1058, 2021 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-33247458

RESUMO

This article concerns evaluating the effectiveness of a continuous diagnostic biomarker against a continuous gold standard that is measured with error. Extending the work of Obuchowski (2005, 2016), Wu et al (2016) suggested an accuracy index and proposed an estimator for the index with error-prone standard when the reliability coefficient is known. Combining with additional measurements (without measurement errors) on the continuous gold standard collected from some subjects, this article proposes two adaptive estimators of the accuracy index when the reliability coefficient is unknown, and further establish the consistency and asymptotic normality of these estimators. Simulation studies are conducted to compare various estimators. Data from an intervention trial on glycemic control among children with type 1 diabetes are used to illustrate the proposed methods.


Assuntos
Reprodutibilidade dos Testes , Biomarcadores , Criança , Simulação por Computador , Interpretação Estatística de Dados , Humanos
6.
Multivariate Behav Res ; 54(6): 856-881, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31215245

RESUMO

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.


Assuntos
Funções Verossimilhança , Análise Multinível , Reprodutibilidade dos Testes , Teorema de Bayes , Humanos , Método de Monte Carlo , Teoria Psicológica , Inquéritos e Questionários
7.
Appl Psychol Meas ; 42(7): 553-570, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30237646

RESUMO

Reliability is usually estimated for a test score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the item's contribution to the test score's reliability, for identifying unreliable scores in aberrant item-score patterns in person-fit analysis, and for selecting the most reliable item from a test to use as a single-item measure. Four methods were discussed for estimating item-score reliability: the Molenaar-Sijtsma method (method MS), Guttman's method λ6 , the latent class reliability coefficient (method LCRC), and the correction for attenuation (method CA). A simulation study was used to compare the methods with respect to median bias, variability (interquartile range [IQR]), and percentage of outliers. The simulation study consisted of six conditions: standard, polytomous items, unequal α parameters, two-dimensional data, long test, and small sample size. Methods MS and CA were the most accurate. Method LCRC showed almost unbiased results, but large variability. Method λ6 consistently underestimated item-score reliabilty, but showed a smaller IQR than the other methods.

8.
Artigo em Inglês | WPRIM (Pacífico Ocidental) | ID: wpr-973095

RESUMO

Introduction@#The clinical skills training at medical schools provides the opportunity for future medical doctors to deal with the client with proper care, diagnosis of the disease, first aid, treatment, nursing, treatment, counseling to address the complexity of the problem solving and the ethical attitude of the doctor. To achieve this objective, it is necessary to assess the level of knowledge, skills and attitudes students have acquired.@*Goal@#To analyze assignment of basic clinical skills assessment and to identify the level of кknowledge and skills students who have graduated second year medical program at “Ach” Medical University during 2016- 2017 academic year.@*Materials and Methods@#The study was used as a descriptive model to measure the reliability of the assignment, the difficulty factor of tasks, and the Hoffsten’s scores based on the tasks and performance of each station and compared with the indicators.@*Results@#Based on Hoffsten’s study on the success rate of examiners at the 5 stations, the Hoffsten’s score level of clinical examination was 68 percent, the physical examination station was 64 percent, the station’s diagnostic level was 71 percent, the laboratory was 70 percent and the nursing station was 70 percent.@*Conclusion@#At each clinical trial, the differential diagnosis of each individual clinical trial, clinical interview, nursing station and visual diagnostic station (DF> 95), at the laboratory and at the physical examination station, assess the student with a higher grade of difficulty factor (DF> 80) to the Hoffsten’s score of the basic clinical skills exam is set to be 70 percent.

9.
Artigo em Inglês | WPRIM (Pacífico Ocidental) | ID: wpr-973093

RESUMO

Introduction@#One of the quality assurance measurements for medical schools is the achievement of students who have graduated in the assessment of the knowledge, skills and attitudes they are trained in.@*Goal@#To analyze assignment of theoretical and practical exam and to identify the level of кknowledge students who have graduate at “Ach” Medical University during 2015-2016 academic year.@*Materials and Methods@#The study was conducted on a cross sectional and descriptive study through the based on the task of analyzing the 261 graduate students theoretical and practical exam performance of the bachelor degree in Medicine, Dentistry, Traditional Medicine and Nursing of Ach Medical University of Mongolia /AMU/ and was assessed and to identify a reliability coefficient, difficulty factor, discrimination index, Hoffsten’s score. @*Results@#The reliabiliy coefficient of graduate exam meets requirement when it’s 0.94-0.96. According to the analysis of the 300 test of the each classroom of graduates was 70 percent (n=202) with weak dicrimination index, difficallty factor was more than 50 percent too easy, The Hoffsten’s score to which exam was passed of Medical graduates is 70 percent, traditional medicine is 87 percent, dentistry is 79 percent, the nursing is a Hoffsten’s score was 80 percent.@*Conclusions @#The reliability coefficient the theoretical exam of the graduates’ knowledge is convenient for all occupations, and whole field examines the weak difficulty index (DI≤0) for all field examinations. The Hoffsten’s score is 70% above the medical field. Graduate assignments can not discriminate graduates’ knowledge and skills levels and the difficulty factor graduate examination was very easy.

10.
J Biopharm Stat ; 26(6): 1111-1117, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27548574

RESUMO

New biomarkers continue to be developed for the purpose of diagnosis, and their diagnostic performances are typically compared with an existing reference biomarker used for the same purpose. Considerable amounts of research have focused on receiver operating characteristic curves analysis when the reference biomarker is dichotomous. In the situation where the reference biomarker is measured on a continuous scale and dichotomization is not practically appealing, an index was proposed in the literature to measure the accuracy of a continuous biomarker, which is essentially a linear function of the popular Kendall's tau. We consider the issue of estimating such an accuracy index when the continuous reference biomarker is measured with errors. We first investigate the impact of measurement errors on the accuracy index, and then propose methods to correct for the bias due to measurement errors. Simulation results show the effectiveness of the proposed estimator in reducing biases. The methods are exemplified with hemoglobin A1c measurements obtained from both the central lab and a local lab to evaluate the accuracy of the mean data obtained from the metered blood glucose monitoring against the centrally measured hemoglobin A1c from a behavioral intervention study for families of youth with type 1 diabetes.


Assuntos
Biomarcadores/análise , Interpretação Estatística de Dados , Testes Diagnósticos de Rotina/estatística & dados numéricos , Confiabilidade dos Dados , Diabetes Mellitus Tipo 1/diagnóstico , Hemoglobinas Glicadas/análise , Humanos , Curva ROC , Padrões de Referência
11.
Artigo em Inglês | WPRIM (Pacífico Ocidental) | ID: wpr-626841

RESUMO

Comparable selection methods based on interview as one of the selection criteria are used in many countries globally however; procedure of interview and its reliability has been of varying nature. A semi-structured interview procedure was developed by the Faculty of Medicine at Universiti Sultan ZainalAbidin to finally select the shortlisted candidates seeking to studying medicine in this institution as the new intake of 2015-2016 sessions of MBBS program. Multiple panels comprising of two members each to independently select the candidate held interview. Inter-ratter reliability of quality assessment was investigated. Current article investigates the inter-ratter reliability of interviewers in quality assessment of candidates seeking to join the Faculty of Medicine at Universiti Sultan ZainalAbidin, Malaysia. An observational study, conducted across all the candidates, who were shortlisted on merit for formal selection through interview procedure. Data reflecting candidates’ characteristics and qualities were collected as quantitative score. Inter-ratter reliability using intra class coefficient was calculated for interpretation. A moderate difference of mean (SD) among the interviewer varying from 37.61 (3.48) to 42.12 (0.60) was observed. The reliability of score varied between 0.50- 0.65, significant at p = < 0.05 with majority assessors. However, among the 4 panels of assessors’ intra-class correlation coefficient was between 0.70-0.0.90 (p = < 0.001). Assessment of candidates’ performance based on observation did not achieve the satisfactory level of intraclass correlation coefficient (ICC ≥ 0.70). However for higher discrepancy in inter-ratter scores in some cases, continuing faculty development program in interviewing skills and calibration workshops are recommended to improve the reliability and validity of quality selection through interview procedure in future.

12.
Artigo em Inglês | WPRIM (Pacífico Ocidental) | ID: wpr-626840

RESUMO

Multiple-choice question as one best answer (OBA) is considered as a more effective tool to test higher order thinking for its reliability and validity compared to objective test (multiple true and false) items. However, to determine quality of OBA questions it needs item analysis for difficulty index (PI) and discrimination index (DI) as well as distractor efficiency (DE) with functional distractor (FD) and non-functional distractor (NFD). However, any flaw in item structuring should not be allowed to affect students’ performance due to the error of measurement. Standard error of measurement (SEM) to calculate a band of score can be utilized to reduce the impact of error in assessment. Present study evaluates the quality of 30 items OBA administered in professional II examination to apply the corrective measures and produce quality items for the question bank. The mean (SD) of 30 items OBA = 61.11 (7.495) and the reliability (internal consistency) as Cronbach’s alpha = 0.447. Out of 30 OBA items 11(36.66%) with PI = 0.31-0.60 and 12 items (40.00%) with DI = ≥0.19 were placed in category to retain item in question bank, 6 items (20.00%) in category to revise items with DI ≤0.19 and remaining 12 items (40.00%) in category to discard items for either with a poor or with negative DI. Out of a total 120 distractors, the non-functional distractors (NFD) were 63 (52.5%) and functional distracters were 57 (47.5%). 28 items (93.33%) were found to contain 1- 4 NFD and only 2 (6.66%) items were without any NFD. Distracter efficiency (DE) result of 28 items with NDF and only 2 items without NDF showed 7 items each with 1 NFD (75% DE) and 4 NFD (0% DE), 10 items with 2 NFD (50% DE) and 4 items with 3 NFD (25% DE). Standard error of measurement (SEM) calculated for OBA has been ± 5.51 and considering the borderline cut-off point set at ≥45%, a band score within 1 SD (68%) is generated for OBA. The high frequency of difficult or easy items and moderate to poor discrimination suggest the need of items corrective measure. Increased number of NFD and low DE in this study indicates difficulty of teaching faculty in developing plausible distractors for OBA question. Standard error of measurement (SEM) should be utilized to calculate a band of score to make logical decision on pass or fail of borderline students.

13.
Artigo em Inglês | WPRIM (Pacífico Ocidental) | ID: wpr-975603

RESUMO

BackgroundHealth professional licensing was introduced in Mongolia in 1999. Medical school graduates shouldpass the health professional licensing exam (HPLE) to be registered. It was informed that HPLEsuccess rate has been decreased for last few years among graduates who passed final theoreticexam (FTE). There has been no research conducted to explain the reasons of such trend. Thisresearch aims to conduct a comparative assessment of MSQs used for both HPLE and FTE.GoalTo analyze examination and test to identify the level of medical knowledge of students who graduateas medical doctor at “Ach” Medical University during 2011- 2015.Materials and MethodsThis is a cross sectional descriptive study. it employed a statistical analysis of 2950 MSQs (24version) that were used for the HPLE by the Health Development Center of the MOH (N=16)and FTE by the “Ach” Medical University (N=8) between 2011 and 2015. Test sheets of HPLE(N=728) and FTE (N=686) were assessed in order to identify a reliability of tests, difficulty index,discrimination index using QuickSCORE II program of the test reading machine with a mode of“Scantron ES-2010”.ResultsThe success rate was much higher in FTE than it in HPLE between 2011 and 2015. The successrate of HPLE decreased dramatically starting from 2013 (87%) to 2014 (4%) and 2015 (24%) whilethe same rate of FTE was stable and almost 100%.FTE’s reliability coefficient of 2011-2015 years meets requirement when it’s 0.92-0.96. HPLE’sreliability coefficient of 2013 and 2014 years don’t meet requirement.From all of the MCQs that has been used in FTE‘s 97% and in HPLE’s 80% are positive discriminationindex which means possible to identify medical school graduates knowledge.ConclusionOur findings confirmed that the success rates of HPLE among medical school graduates are beingquite low.Reliability coefficient of HPLE tests were less reliable (КР20=0.66-0.86) than FTE (КР20=0,92-0.96) and particularly tests for 2014 and 2015 were more difficult and were with high percentage ofnegative discrimination.Test score between HPLE and FTE of 2011-2015 is direct linear correlation.

14.
Neuroimage Clin ; 5: 309-21, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25161897

RESUMO

As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test-retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test-retest reliability.


Assuntos
Encéfalo/fisiologia , Imageamento por Ressonância Magnética , Atividade Motora/fisiologia , Dor/fisiopatologia , Adulto , Idoso , Mapeamento Encefálico/métodos , Feminino , Temperatura Alta , Humanos , Processamento de Imagem Assistida por Computador , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Adulto Jovem
15.
Univ. psychol ; 13(1): 217-226, ene.-mar. 2014. ilus, tab
Artigo em Espanhol | LILACS | ID: lil-726972

RESUMO

La fiabilidad de los puntajes es una de las propiedades psicométricas más importantes de un test psicológico. Sin embargo, a menudo los test son utilizados para hacer clasificaciones dicotómicas de personas, como sucede en las pruebas de screening psicopatológico o en selección de personal. En esos casos, los coeficientes de fiabilidad convencionales no resultan apropiados para estimar la precisión de las clasificaciones. En este trabajo se presenta el coeficiente K² de Livingston (1972, 1973) y se demuestra su uso a través de dos ejemplos empíricos, para estimar la fiabilidad de una clasificación realizada a partir de un test psicológico.


The reliability of test scores is one of the most important psychometric properties of a psychological test. However, the tests are often used for dichotomous classifications of people, as in tests used for screening or recruitment purposes. In such cases, the conventional reliability coefficients are not suitable for estimating the accuracy of the classifications. This paper introduces the coefficient K² of Livingston (1972, 1973) and demonstrates its use through two empirical examples to estimate the reliability of a classification based on psychological tests.


Assuntos
Testes Psicológicos , Confiabilidade dos Dados
16.
Rev. medica electron ; 34(1): 1-6, ene.-feb. 2012.
Artigo em Espanhol | LILACS | ID: lil-629890

RESUMO

Se realizó un estudio de carácter evaluativo para determinar la precisión de un examen escrito mediante un análisis de fiabilidad. El examen, integrado por 30 preguntas que exploran cierto tipo de conocimiento profesional, se aplicó a 45 personas, y con los puntajes obtenidos se creó una base de datos en SPSS para Windows, versión 16. Se calculó el coeficiente alfa de Cronbach, para diferentes partes del examen y coeficientes de discriminación para determinar las variables más relevantes a los efectos de la confiabilidad del examen. Los resultados principales se presentaron en tablas. Se obtuvo un valor negativo para la confiabilidad del examen, y mediante eliminación de preguntas y cambio de escala de los puntajes, se obtuvo un examen de confiabilidad aceptable. Se concluyó que el análisis de fiabilidad es un procedimiento efectivo para incrementar la precisión de un examen


It was performed a study, to evaluate the precision of a written test by means of a reliability analysis. The test, with 30 questions exploring some kind of professional knowledge, was applied to a set of 45 persons. Using the SPSS for Windows Version 16 software, a data base was created with the scores obtained. We calculated the Cronbach alpha coefficients for different parts of the test and the corresponding indexes of discrimination in order to determine the most relevant variables related with the reliability of the test. The principal findings were presented in statistical tables. A negative value of the reliability coefficient was obtained for the initial test, but, by means of putting out some questions and making appropriate changes in the score scales of them, a new version of the test with a medium level of reliability was obtained. It was concluded that the reliability analysis is an effective tool to increase the precision of a test


Assuntos
Humanos , Estatística como Assunto/métodos , Interpretação Estatística de Dados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...