Intraclass Correlation Examples in R

C. Sean Burns


Intraclass Correlations Notes

Working through the intraclass correlation coefficients (ICC) by reading:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.,

And using the irr package along with its documentation:

Gamer, Matthias. Lemon, Jim, Fellows, Ian, & Singh, Puspendra. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84.

Toy Data

The toy data from the Shrout & Fleiss article, Table 2, p. 423:

|          |          Judge         |
|  Target  |  1   |  2  |  3  |  4  |
|  1       |  9   |  2  |  5  |  8  |
|  2       |  6   |  1  |  3  |  2  |
|  3       |  8   |  4  |  6  |  8  |
|  4       |  7   |  1  |  2  |  6  |
|  5       |  10  |  5  |  6  |  9  |
|  6       |  6   |  2  |  4  |  7  |

Add the above data to R and structure for running an Anova:

scores  <- c(9,6,8,7,10,6,2,1,4,1,5,2,5,3,6,2,6,4,8,2,8,6,9,7)
targets <- rep(c("target1", "target2", "target3", "target4", "target5", "target6"), 4)
judges  <- c(rep("judge1", 6), rep("judge2", 6), rep("judge3", 6), rep("judge4", 6))
stj_df  <- data.frame(scores, targets, judges)

Resulting data frame:

scores judges targets
1 9 judge1 target1
2 6 judge1 target2
3 8 judge1 target3
4 7 judge1 target4
5 10 judge1 target5
6 6 judge1 target6
7 2 judge2 target1
8 1 judge2 target2
9 4 judge2 target3
10 1 judge2 target4
11 5 judge2 target5
12 2 judge2 target6
13 5 judge3 target1
14 3 judge3 target2
15 6 judge3 target3
16 2 judge3 target4
17 6 judge3 target5
18 4 judge3 target6
19 8 judge4 target1
20 2 judge4 target2
21 8 judge4 target3
22 6 judge4 target4
23 9 judge4 target5
24 7 judge4 target6

Relevant summary of statistics:

Group N Mean Var
Target 1 4 6.00 10.00
Target 2 4 3.00 4.67
Target 3 4 6.50 3.67
Target 4 4 4.00 8.67
Target 5 4 7.50 5.67
Target 6 4 4.75 4.92
Total 24 5.29 7.35
Group N Mean Var
Judge 1 6 7.67 2.67
Judge 2 6 2.50 2.70
Judge 3 6 4.33 2.67
Judge 4 6 6.67 6.27
Total 24 5.29 7.35

ICC Versions

Shrout & Fleiss document six versions of the intraclass correlation coefficient (ICC). In deciding which version to use, they state:

The guidelines for choosing the appropriate form of the ICC call for three decisions: (a) Is a one-way or two-way analysis of variance (ANOVA) appropriate for the analysis of the reliability study? (b) Are differences between the judges’ mean ratings relevant to the reliability study? %%(c)%% Is the unit of analysis an individual rating or the mean of several ratings? (p. 420)

This results in the following six forms:

More specifically:

  1. Each target is rated by a different set of k judges, randomly selected from a larger population of judges (p. 421).
fit.1 <- aov(scores ~ targets, data = stj_df)
           Df Sum Sq Mean Sq F value Pr(>F)
targets      5  56.21  11.242   1.795  0.165
Residuals   18 112.75   6.264               

Case 1: One-way, Consistency, Single

The ICC(1,1) estimate (one-way, consistency, single):

ICC(1,1)=BMSWMSBMS+(k1)WMSICC(1,1) = \frac{BMS - WMS}{BMS + (k - 1)WMS}



ICC(1,1)=11.246.2611.24+(41)6.26=0.17 \begin{align*} ICC(1,1) & = \frac{11.24 - 6.26}{11.24 + (4 - 1)6.26} \\ & = 0.17 \end{align*}

The ICC(1,4) estimate (one-way, consistency, average):

ICC(1,4)=BMSWMSBMSICC(1,4) = \frac{BMS - WMS}{BMS}


ICC(1,4)=11.246.2611.24=0.44 \begin{align*} ICC(1,4) & = \frac{11.24 - 6.26}{11.24} \\ & = 0.44 \end{align*}

Case 2: Two-way, Agreement, Single or Average

  1. A random sample of k judges is selected from a larger population, and each judge rates each target, that is, each judge rates n targets altogether (p 421).
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02   
Case 2a: Two,way, Agreement, Single

The ICC(2,1) estimate (two-way, agreement, single):

ICC(2,1)=BMSEMSBMS+(k1)EMS+k(JMSEMS)nICC(2,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS + \frac{k(JMS - EMS)}{n}}



ICC(2,1)=11.241.0211.24+(41)1.02+4(32.491.02)6=0.29 \begin{align*} ICC(2,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02 + \frac{4(32.49 - 1.02)}{6}} \\ & = 0.29 \end{align*}

Case 2b: Two,way, Agreement, Average

The ICC(2,4) estimate (two-way, agreement, average):

ICC(2,4)=BMSEMSBMS+(JMSEMS)nICC(2,4) = \frac{BMS - EMS}{BMS + \frac{(JMS - EMS)}{n}}


ICC(2,4)=11.241.0211.24+(32.491.02)6=0.62 \begin{align*} ICC(2,4) & = \frac{11.24 - 1.02}{11.24 + \frac{(32.49 - 1.02)}{6}} \\ & = 0.62 \end{align*}

Case 3: Two-way, Consistency, Single or Average

  1. Each target is rated by each of the same k judges, who are the only judges of interest (p. 421).
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02   
Case 3a: Two-way, Consistency, Single

The ICC(3,1) estimate (two-way, consistency, single):

ICC(3,1)=BMSEMSBMS+(k1)EMSICC(3,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS}



ICC(3,1)=11.241.0211.24+(41)1.02=0.71 \begin{align*} ICC(3,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02} \\ & = 0.71 \end{align*}

Case 3b: Two-way, Consistency, Average

The ICC(3,4) estimate (two-way, consistency, average):

ICC(3,4)=BMSEMSBMSICC(3,4) = \frac{BMS - EMS}{BMS}


ICC(3,4)=11.241.0211.24=0.91 \begin{align*} ICC(3,4) & = \frac{11.24 - 1.02}{11.24} \\ & = 0.91 \end{align*}

These agree with the ICC scores in Table 4 from Shrout & Fleiss (p. 424):

Version Estimate Model Type Unit of Analysis
ICC(1,1) 0.17 One-way Consistency Single
ICC(1,4) 0.44 One-way Consistency Average
ICC(2,1) 0.29 Two-way Agreement Single
ICC(2,4) 0.62 Two-way Agreement Average
ICC(3,1) 0.71 Two-way Consistency Single
ICC(3,4) 0.91 Two-way Consistency Average

ICC in R

Using the irr package, the data has to be reshaped (here just re-added into R):

score_1 <- c(9,6,8,7,10,6)
score_2 <- c(2,1,4,1,5,2)
score_3 <- c(5,3,6,2,6,4)
score_4 <- c(8,2,8,6,9,7)

Viewing the data (irr uses the data as it appears in the table at the top of this page):

cbind(score_1, score_2, score_3, score_4)
     score_1 score_2 score_3 score_4
[1,]       9       2       5       8
[2,]       6       1       3       2
[3,]       8       4       6       8
[4,]       7       1       2       6
[5,]      10       5       6       9
[6,]       6       2       4       7


Case 1: One-Way, Consistency, Single

ICC(1,1) (one-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: oneway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
     ICC(1) = 0.166

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 

 95%-Confidence Interval for ICC Population Values:
  -0.133 < ICC < 0.723

ICC(1,4) (one-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: oneway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
     ICC(4) = 0.443

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 

 95%-Confidence Interval for ICC Population Values:
  -0.884 < ICC < 0.912

Case 2: Two-way, Agreement, Single

ICC(2,1) (two-way, agreement, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: twoway 
   Type : agreement 

   Subjects = 6 
     Raters = 4 
   ICC(A,1) = 0.29

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.79) = 11 , p = 0.0113 

 95%-Confidence Interval for ICC Population Values:
  0.019 < ICC < 0.761

Case 3: Two-way, Agreement, Average

ICC(2,4) (two-way, agreement, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: twoway 
   Type : agreement 

   Subjects = 6 
     Raters = 4 
   ICC(A,4) = 0.62

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.19) = 11 , p = 0.0165 

 95%-Confidence Interval for ICC Population Values:
  0.039 < ICC < 0.929

Case 4: Two-way, Consistency, Single

ICC(3,1) (two-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: twoway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
   ICC(C,1) = 0.715

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 

 95%-Confidence Interval for ICC Population Values:
  0.342 < ICC < 0.946

Case 5: Two-way, Consistency, Average

ICC(3,4) (two-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: twoway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
   ICC(C,4) = 0.909

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 

 95%-Confidence Interval for ICC Population Values:
  0.676 < ICC < 0.986