Intraclass Correlation Examples in R

C. Sean Burns

02-21-2024

Intraclass Correlations Notes

Intraclass Correlations Notes

Working through the intraclass correlation coefficients (ICC) by reading:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. http://dx.doi.org/10.1037/0033-2909.86.2.420, http://www.ncbi.nlm.nih.gov/pubmed/18839484

And using the irr package along with its documentation:

Gamer, Matthias. Lemon, Jim, Fellows, Ian, & Singh, Puspendra. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84. http://CRAN.R-project.org/package=irr

Toy Data

The toy data from the Shrout & Fleiss article, Table 2, p. 423:

|----------|------------------------|
|          |          Judge         |
|----------|------------------------|
|  Target  |  1   |  2  |  3  |  4  |
|----------|------------------------|
|  1       |  9   |  2  |  5  |  8  |
|  2       |  6   |  1  |  3  |  2  |
|  3       |  8   |  4  |  6  |  8  |
|  4       |  7   |  1  |  2  |  6  |
|  5       |  10  |  5  |  6  |  9  |
|  6       |  6   |  2  |  4  |  7  |
|----------|------------------------|

Add the above data to R and structure for running an Anova:

scores  <- c(9,6,8,7,10,6,2,1,4,1,5,2,5,3,6,2,6,4,8,2,8,6,9,7)
targets <- rep(c("target1", "target2", "target3", "target4", "target5", "target6"), 4)
judges  <- c(rep("judge1", 6), rep("judge2", 6), rep("judge3", 6), rep("judge4", 6))
stj_df  <- data.frame(scores, targets, judges)

Resulting data frame:

	scores	judges	targets
1	9	judge1	target1
2	6	judge1	target2
3	8	judge1	target3
4	7	judge1	target4
5	10	judge1	target5
6	6	judge1	target6
7	2	judge2	target1
8	1	judge2	target2
9	4	judge2	target3
10	1	judge2	target4
11	5	judge2	target5
12	2	judge2	target6
13	5	judge3	target1
14	3	judge3	target2
15	6	judge3	target3
16	2	judge3	target4
17	6	judge3	target5
18	4	judge3	target6
19	8	judge4	target1
20	2	judge4	target2
21	8	judge4	target3
22	6	judge4	target4
23	9	judge4	target5
24	7	judge4	target6

Relevant summary of statistics:

Group	N	Mean	Var
Target 1	4	6.00	10.00
Target 2	4	3.00	4.67
Target 3	4	6.50	3.67
Target 4	4	4.00	8.67
Target 5	4	7.50	5.67
Target 6	4	4.75	4.92
Total	24	5.29	7.35

Group	N	Mean	Var
Judge 1	6	7.67	2.67
Judge 2	6	2.50	2.70
Judge 3	6	4.33	2.67
Judge 4	6	6.67	6.27
Total	24	5.29	7.35

ICC Versions

Shrout & Fleiss document six versions of the intraclass correlation coefficient (ICC). In deciding which version to use, they state:

The guidelines for choosing the appropriate form of the ICC call for three decisions: (a) Is a one-way or two-way analysis of variance (ANOVA) appropriate for the analysis of the reliability study? (b) Are differences between the judges’ mean ratings relevant to the reliability study? %%(c)%% Is the unit of analysis an individual rating or the mean of several ratings? (p. 420)

This results in the following six forms:

If (a) one-way, then (b) is always no (consistency) and %%(c)%% can be either an individual rating (single) or mean of several ratings (average).
If (a) two-way, then (b) may either be yes (agreement) or no (consistency) and %%(c)%% can be either an individual rating (single) or the mean of several ratings (average).

More specifically:

Each target is rated by a different set of k judges, randomly selected from a larger population of judges (p. 421).

If Case 1, then (a) is always one-way, (b) is always no (consistency), but %%(c)%% may be either single or average.
- ICC(1,1): one-way, consistency, single
- ICC(1,4): one-way, consistency, average
The one-way ANOVA model:

fit.1 <- aov(scores ~ targets, data = stj_df)
summary(fit.1)
           Df Sum Sq Mean Sq F value Pr(>F)
targets      5  56.21  11.242   1.795  0.165
Residuals   18 112.75   6.264

Case 1: One-way, Consistency, Single

The ICC(1,1) estimate (one-way, consistency, single):

$ICC(1,1) = \frac{BMS - WMS}{BMS + (k - 1)WMS}$

Where:

BMS is the between-targets mean square (11.24)
WMS is the within-target mean square (6.26)
k is the number of judges (4)

Therefore:

$\begin{align*} ICC(1,1) & = \frac{11.24 - 6.26}{11.24 + (4 - 1)6.26} \\ & = 0.17 \end{align*}$

The ICC(1,4) estimate (one-way, consistency, average):

$ICC(1,4) = \frac{BMS - WMS}{BMS}$

Therefore:

$\begin{align*} ICC(1,4) & = \frac{11.24 - 6.26}{11.24} \\ & = 0.44 \end{align*}$

Case 2: Two-way, Agreement, Single or Average

A random sample of k judges is selected from a larger population, and each judge rates each target, that is, each judge rates n targets altogether (p 421).

If Case 2, then (a) is always two-way, (b) is always yes (agreement), and %%(c)%% may be either single or average.
- ICC(2,1): two-way, agreement, single
- ICC(2,4): two-way, agreement, average
The two-way ANOVA model:

fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02

Case 2a: Two,way, Agreement, Single

The ICC(2,1) estimate (two-way, agreement, single):

$ICC(2,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS + \frac{k(JMS - EMS)}{n}}$

Where:

BMS is the between-targets mean square (11.24)
JMS is the mean square for the between-judges sum of squares (32.49)
EMS is the mean square for the residual sum of squares (1.02)
k is the number of judges (4)
n is the number of targets (6)

Therefore:

$\begin{align*} ICC(2,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02 + \frac{4(32.49 - 1.02)}{6}} \\ & = 0.29 \end{align*}$

Case 2b: Two,way, Agreement, Average

The ICC(2,4) estimate (two-way, agreement, average):

$ICC(2,4) = \frac{BMS - EMS}{BMS + \frac{(JMS - EMS)}{n}}$

Therefore:

$\begin{align*} ICC(2,4) & = \frac{11.24 - 1.02}{11.24 + \frac{(32.49 - 1.02)}{6}} \\ & = 0.62 \end{align*}$

Case 3: Two-way, Consistency, Single or Average

Each target is rated by each of the same k judges, who are the only judges of interest (p. 421).

If Case 3, then (a) is always two-way, (b) is always no (consistency) and %%(c)%% may be either single or average.
- ICC(3,1): two-way, consistency, single
- ICC(3,4): two-way, consistency, average
The two-way ANOVA model (same as for Case 2):

fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
            Df Sum Sq Mean Sq F value   Pr(>F)    
targets      5  56.21   11.24   11.03 0.000135 ***
judges       3  97.46   32.49   31.87 9.45e-07 ***
Residuals   15  15.29    1.02

Case 3a: Two-way, Consistency, Single

The ICC(3,1) estimate (two-way, consistency, single):

$ICC(3,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS}$

Where:

BMS is the between-targets mean square (11.24)
EMS is the mean square for the residual sum of squares (1.02)
k is the number of judges (4)

Therefore:

$\begin{align*} ICC(3,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02} \\ & = 0.71 \end{align*}$

Case 3b: Two-way, Consistency, Average

The ICC(3,4) estimate (two-way, consistency, average):

$ICC(3,4) = \frac{BMS - EMS}{BMS}$

Therefore:

$\begin{align*} ICC(3,4) & = \frac{11.24 - 1.02}{11.24} \\ & = 0.91 \end{align*}$

These agree with the ICC scores in Table 4 from Shrout & Fleiss (p. 424):

Version	Estimate	Model	Type	Unit of Analysis
ICC(1,1)	0.17	One-way	Consistency	Single
ICC(1,4)	0.44	One-way	Consistency	Average
ICC(2,1)	0.29	Two-way	Agreement	Single
ICC(2,4)	0.62	Two-way	Agreement	Average
ICC(3,1)	0.71	Two-way	Consistency	Single
ICC(3,4)	0.91	Two-way	Consistency	Average

ICC in R

Using the irr package, the data has to be reshaped (here just re-added into R):

library("irr")
score_1 <- c(9,6,8,7,10,6)
score_2 <- c(2,1,4,1,5,2)
score_3 <- c(5,3,6,2,6,4)
score_4 <- c(8,2,8,6,9,7)

Viewing the data (irr uses the data as it appears in the table at the top of this page):

cbind(score_1, score_2, score_3, score_4)
     score_1 score_2 score_3 score_4
[1,]       9       2       5       8
[2,]       6       1       3       2
[3,]       8       4       6       8
[4,]       7       1       2       6
[5,]      10       5       6       9
[6,]       6       2       4       7

Then:

Case 1: One-Way, Consistency, Single

ICC(1,1) (one-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: oneway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
     ICC(1) = 0.166

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 

 95%-Confidence Interval for ICC Population Values:
  -0.133 < ICC < 0.723

ICC(1,4) (one-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "oneway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: oneway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
     ICC(4) = 0.443

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,18) = 1.79 , p = 0.165 

 95%-Confidence Interval for ICC Population Values:
  -0.884 < ICC < 0.912

Case 2: Two-way, Agreement, Single

ICC(2,1) (two-way, agreement, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: twoway 
   Type : agreement 

   Subjects = 6 
     Raters = 4 
   ICC(A,1) = 0.29

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.79) = 11 , p = 0.0113 

 95%-Confidence Interval for ICC Population Values:
  0.019 < ICC < 0.761

Case 3: Two-way, Agreement, Average

ICC(2,4) (two-way, agreement, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "agreement",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: twoway 
   Type : agreement 

   Subjects = 6 
     Raters = 4 
   ICC(A,4) = 0.62

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
  F(5,4.19) = 11 , p = 0.0165 

 95%-Confidence Interval for ICC Population Values:
  0.039 < ICC < 0.929

Case 4: Two-way, Consistency, Single

ICC(3,1) (two-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "single")
 Single Score Intraclass Correlation

   Model: twoway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
   ICC(C,1) = 0.715

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 

 95%-Confidence Interval for ICC Population Values:
  0.342 < ICC < 0.946

Case 5: Two-way, Consistency, Average

ICC(3,4) (two-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4),
    model = "twoway",
    type  = "consistency",
    unit  = "average")
 Average Score Intraclass Correlation

   Model: twoway 
   Type : consistency 

   Subjects = 6 
     Raters = 4 
   ICC(C,4) = 0.909

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(5,15) = 11 , p = 0.000135 

 95%-Confidence Interval for ICC Population Values:
  0.676 < ICC < 0.986