User Tools

Site Tools


r:intraclass-correlation

Intraclass Correlations Notes

Working through the intraclass correlation coefficients (ICC) by reading:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. http://dx.doi.org/10.1037/0033-2909.86.2.420 http://www.ncbi.nlm.nih.gov/pubmed/18839484

And using the irr package along with its documentation:

Gamer, Matthias. Lemon, Jim, Fellows, Ian, & Singh, Puspendra. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84. http://CRAN.R-project.org/package=irr

Toy Data

The toy data from the Shrout & Fleiss article, Table 2, p. 423:

Judge
Target 1 2 3 4
1 9 2 5 8
2 6 1 3 2
3 8 4 6 8
4 7 1 2 6
5 10 5 6 9
6 6 2 4 7

Add the above data to R and structure for running an Anova:

scores  <- c(9,6,8,7,10,6,2,1,4,1,5,2,5,3,6,2,6,4,8,2,8,6,9,7)
targets <- rep(c("target1", "target2", "target3", "target4", "target5", "target6"), 4)
judges  <- c(rep("judge1", 6), rep("judge2", 6), rep("judge3", 6), rep("judge4", 6))
stj_df  <- data.frame(scores, targets, judges)

Resulting data frame:

scores judges targets
1 9 judge1 target1
2 6 judge1 target2
3 8 judge1 target3
4 7 judge1 target4
5 10 judge1 target5
6 6 judge1 target6
7 2 judge2 target1
8 1 judge2 target2
9 4 judge2 target3
10 1 judge2 target4
11 5 judge2 target5
12 2 judge2 target6
13 5 judge3 target1
14 3 judge3 target2
15 6 judge3 target3
16 2 judge3 target4
17 6 judge3 target5
18 4 judge3 target6
19 8 judge4 target1
20 2 judge4 target2
21 8 judge4 target3
22 6 judge4 target4
23 9 judge4 target5
24 7 judge4 target6

Relevant summary of statistics:

Group N Mean Var
Target 1 4 6.00 10.00
Target 2 4 3.00 4.67
Target 3 4 6.50 3.67
Target 4 4 4.00 8.67
Target 5 4 7.50 5.67
Target 6 4 4.75 4.92
Total 24 5.29 7.35
Group N Mean Var
Judge 1 6 7.67 2.67
Judge 2 6 2.50 2.70
Judge 3 6 4.33 2.67
Judge 4 6 6.67 6.27
Total 24 5.29 7.35

ICC Versions

Shrout & Fleiss document six versions of the intraclass correlation coefficient (ICC). In deciding which version to use, they state:

The guidelines for choosing the appropriate form of the ICC call for three decisions: (a) Is a one-way or two-way analysis of variance (ANOVA) appropriate for the analysis of the reliability study? (b) Are differences between the judges' mean ratings relevant to the reliability study? (c) Is the unit of analysis an individual rating or the mean of several ratings? (p. 420)

This results in the following six forms:

  1. If (a) one-way, then (b) is always no (consistency) and (c) can be either an individual rating (single) or mean of several ratings (average).
  2. If (a) two-way, then (b) may either be yes (agreement) or no (consistency) and (c) can be either an individual rating (single) or the mean of several ratings (average).

More specifically:

Case 1

1. Each target is rated by a different set of k judges, randomly selected from a larger population of judges (p. 421).
  • If Case 1, then (a) is always one-way, (b) is always no (consistency), but (c) may be either single or average.
    • ICC(1,1): one-way, consistency, single
    • ICC(1,4): one-way, consistency, average
  • The one-way ANOVA model:
fit.1 <- aov(scores ~ targets, data = stj_df)
summary(fit.1)
           Df Sum Sq Mean Sq F value Pr(>F)
targets      5  56.21  11.242   1.795  0.165
Residuals   18 112.75   6.264               

The ICC(1,1) estimate (one-way, consistency, single):

$$ ICC(1,1) = \frac{BMS - WMS}{BMS + (k - 1)WMS} $$

Where:

  • BMS is the between-targets mean square (11.24)
  • WMS is the within-target mean square (6.26)
  • k is the number of judges (4)

Therefore:

\begin{align*} ICC(1,1) & = \frac{11.24 - 6.26}{11.24 + (4 - 1)6.26} \\ & = 0.17 \end{align*}

The ICC(1,4) estimate (one-way, consistency, average):

$$ ICC(1,4) = \frac{BMS - WMS}{BMS} $$

Therefore:

\begin{align*} ICC(1,4) & = \frac{11.24 - 6.26}{11.24} \\ & = 0.44 \end{align*}

Case 2

2. A random sample of k judges is selected from a larger population, and each judge rates each target, that is, each judge rates n targets altogether (p 421).
  • If Case 2, then (a) is always two-way, (b) is always yes (agreement), and (c) may be either single or average.
    • ICC(2,1): two-way, agreement, single
    • ICC(2,4): two-way, agreement, average
  • The two-way ANOVA model:
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02   

The ICC(2,1) estimate (two-way, agreement, single):

$$ ICC(2,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS + \frac{k(JMS - EMS)}{n}} $$

Where:

  • BMS is the between-targets mean square (11.24)
  • JMS is the mean square for the between-judges sum of squares (32.49)
  • EMS is the mean square for the residual sum of squares (1.02)
  • k is the number of judges (4)
  • n is the number of targets (6)

Therefore:

\begin{align*} ICC(2,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02 + \frac{4(32.49 - 1.02)}{6}} \\ & = 0.29 \end{align*}

The ICC(2,4) estimate (two-way, agreement, average):

$$ ICC(2,4) = \frac{BMS - EMS}{BMS + \frac{(JMS - EMS)}{n}} $$

Therefore:

\begin{align*} ICC(2,4) & = \frac{11.24 - 1.02}{11.24 + \frac{(32.49 - 1.02)}{6}} \\ & = 0.62 \end{align*}

Case 3

3. Each target is rated by each of the same k judges, who are the only judges of interest (p. 421).
  • If Case 3, then (a) is always two-way, (b) is always no (consistency) and (c) may be either single or average.
    • ICC(3,1): two-way, consistency, single
    • ICC(3,4): two-way, consistency, average
  • The two-way ANOVA model (same as for Case 2):
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02   

The ICC(3,1) estimate (two-way, consistency, single):

$$ ICC(3,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS} $$

Where:

  • BMS is the between-targets mean square (11.24)
  • EMS is the mean square for the residual sum of squares (1.02)
  • k is the number of judges (4)

Therefore:

\begin{align*} ICC(3,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02} \\ & = 0.71 \end{align*}

The ICC(3,4) estimate (two-way, consistency, average):

$$ ICC(3,4) = \frac{BMS - EMS}{BMS} $$

Therefore:

\begin{align*} ICC(3,4) & = \frac{11.24 - 1.02}{11.24} \\ & = 0.91 \end{align*}

These agree with the ICC scores in Table 4 from Shrout & Fleiss (p. 424):

Version Estimate Model Type Unit of Analysis
ICC(1,1) 0.17 One-way Consistency Single
ICC(1,4) 0.44 One-way Consistency Average
ICC(2,1) 0.29 Two-way Agreement Single
ICC(2,4) 0.62 Two-way Agreement Average
ICC(3,1) 0.71 Two-way Consistency Single
ICC(3,4) 0.91 Two-way Consistency Average

ICC in R

Using the irr package, the data has to be reshaped (here just re-added into R):

library("irr")
score_1 <- c(9,6,8,7,10,6)
score_2 <- c(2,1,4,1,5,2)
score_3 <- c(5,3,6,2,6,4)
score_4 <- c(8,2,8,6,9,7)

Viewing the data (irr uses the data as it appears in the table at the top of this page):

cbind(score_1, score_2, score_3, score_4)
     score_1 score_2 score_3 score_4
[1,]       9       2       5       8
[2,]       6       1       3       2
[3,]       8       4       6       8
[4,]       7       1       2       6
[5,]      10       5       6       9
[6,]       6       2       4       7

Then:

Case 1

ICC(1,1) (one-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation
 
   Model: oneway 
   Type : consistency 
 
   Subjects = 6 
     Raters = 4 
     ICC(1) = 0.166
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 
 
 95%-Confidence Interval for ICC Population Values:
  -0.133 < ICC < 0.723

ICC(1,4) (one-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation
 
   Model: oneway 
   Type : consistency 
 
   Subjects = 6 
     Raters = 4 
     ICC(4) = 0.443
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 
 
 95%-Confidence Interval for ICC Population Values:
  -0.884 < ICC < 0.912

Case 2

ICC(2,1) (two-way, agreement, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "single")
 Single Score Intraclass Correlation
 
   Model: twoway 
   Type : agreement 
 
   Subjects = 6 
     Raters = 4 
   ICC(A,1) = 0.29
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.79) = 11 , p = 0.0113 
 
 95%-Confidence Interval for ICC Population Values:
  0.019 < ICC < 0.761

ICC(2,4) (two-way, agreement, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "average")
 Average Score Intraclass Correlation
 
   Model: twoway 
   Type : agreement 
 
   Subjects = 6 
     Raters = 4 
   ICC(A,4) = 0.62
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.19) = 11 , p = 0.0165 
 
 95%-Confidence Interval for ICC Population Values:
  0.039 < ICC < 0.929

Case 3

ICC(3,1) (two-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation
 
   Model: twoway 
   Type : consistency 
 
   Subjects = 6 
     Raters = 4 
   ICC(C,1) = 0.715
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 
 
 95%-Confidence Interval for ICC Population Values:
  0.342 < ICC < 0.946

ICC(3,4) (two-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation
 
   Model: twoway 
   Type : consistency 
 
   Subjects = 6 
     Raters = 4 
   ICC(C,4) = 0.909
 
 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 
 
 95%-Confidence Interval for ICC Population Values:
  0.676 < ICC < 0.986
r/intraclass-correlation.txt · Last modified: 2015/12/09 11:56 by seanburns