ABSTRACT
When studying the degree of overall agreement between the nominal responses of two raters, it is customary to use the coefficient kappa. A more detailed analysis requires the evaluation of the degree of agreement category by category, and this is carried out in two different ways: using the value of kappa in the collapsed table for each category or using the agreement index for each category (proportion of agreements observed). Both indices have disadvantages: the former is sensitive to marginal totals; the latter is not chance corrected; and neither distinguishes the case where one of the two raters is a gold standard (an expert) from the case where neither rater is a gold standard. This article suggests five chance-corrected indices which are not sensitive to marginal totals and which differ depending on whether there is a standard rater. The article also justifies the reason for poor performance of kappa when the two marginal totals are unbalanced (especially if they are so in opposite directions) and the reason for its good performance when analysing the various 2 x 2 tables obtained by the collapse of a wider table.
Subject(s)
Observer Variation , Psychology/statistics & numerical data , Reproducibility of Results , Data Interpretation, Statistical , Humans , Models, Statistical , SpainABSTRACT
The most common measure of agreement for categorical data is the coefficient kappa. However, kappa performs poorly when the marginal distributions are very asymmetric, it is not easy to interpret, and its definition is based on hypothesis of independence of the responses (which is more restrictive than the hypothesis that kappa has a value of zero). This paper defines a new measure of agreement, delta, 'the proportion of agreements that are not due to chance', which comes from model of multiple-choice tests and does not have the previous limitations. The paper shows that kappa and delta generally take very similar values, except when the marginal distributions are strongly unbalanced. The case of the 2 x 2 tables (which admits very simple solutions) is considered in detail.