02-21-2024

Working through the **intraclass correlation coefficients
(ICC)** by reading:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations:
Uses in assessing rater reliability. **Psychological Bulletin,
86**(2), 420-428. http://dx.doi.org/10.1037/0033-2909.86.2.420,
http://www.ncbi.nlm.nih.gov/pubmed/18839484

And using the **irr** package along with its
documentation:

Gamer, Matthias. Lemon, Jim, Fellows, Ian, & Singh, Puspendra. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84. http://CRAN.R-project.org/package=irr

The toy data from the Shrout & Fleiss article, Table 2, p. 423:

```
|----------|------------------------|
| | Judge |
|----------|------------------------|
| Target | 1 | 2 | 3 | 4 |
|----------|------------------------|
| 1 | 9 | 2 | 5 | 8 |
| 2 | 6 | 1 | 3 | 2 |
| 3 | 8 | 4 | 6 | 8 |
| 4 | 7 | 1 | 2 | 6 |
| 5 | 10 | 5 | 6 | 9 |
| 6 | 6 | 2 | 4 | 7 |
|----------|------------------------|
```

Add the above data to R and structure for running an Anova:

```
scores <- c(9,6,8,7,10,6,2,1,4,1,5,2,5,3,6,2,6,4,8,2,8,6,9,7)
targets <- rep(c("target1", "target2", "target3", "target4", "target5", "target6"), 4)
judges <- c(rep("judge1", 6), rep("judge2", 6), rep("judge3", 6), rep("judge4", 6))
stj_df <- data.frame(scores, targets, judges)
```

Resulting data frame:

scores | judges | targets | |
---|---|---|---|

1 | 9 | judge1 | target1 |

2 | 6 | judge1 | target2 |

3 | 8 | judge1 | target3 |

4 | 7 | judge1 | target4 |

5 | 10 | judge1 | target5 |

6 | 6 | judge1 | target6 |

7 | 2 | judge2 | target1 |

8 | 1 | judge2 | target2 |

9 | 4 | judge2 | target3 |

10 | 1 | judge2 | target4 |

11 | 5 | judge2 | target5 |

12 | 2 | judge2 | target6 |

13 | 5 | judge3 | target1 |

14 | 3 | judge3 | target2 |

15 | 6 | judge3 | target3 |

16 | 2 | judge3 | target4 |

17 | 6 | judge3 | target5 |

18 | 4 | judge3 | target6 |

19 | 8 | judge4 | target1 |

20 | 2 | judge4 | target2 |

21 | 8 | judge4 | target3 |

22 | 6 | judge4 | target4 |

23 | 9 | judge4 | target5 |

24 | 7 | judge4 | target6 |

Relevant summary of statistics:

Group | N | Mean | Var |
---|---|---|---|

Target 1 | 4 | 6.00 | 10.00 |

Target 2 | 4 | 3.00 | 4.67 |

Target 3 | 4 | 6.50 | 3.67 |

Target 4 | 4 | 4.00 | 8.67 |

Target 5 | 4 | 7.50 | 5.67 |

Target 6 | 4 | 4.75 | 4.92 |

Total | 24 | 5.29 | 7.35 |

Group | N | Mean | Var |
---|---|---|---|

Judge 1 | 6 | 7.67 | 2.67 |

Judge 2 | 6 | 2.50 | 2.70 |

Judge 3 | 6 | 4.33 | 2.67 |

Judge 4 | 6 | 6.67 | 6.27 |

Total | 24 | 5.29 | 7.35 |

Shrout & Fleiss document six versions of the intraclass
correlation coefficient (**ICC**). In deciding which
version to use, they state:

The guidelines for choosing the appropriate form of the

ICCcall for three decisions: (a) Is a one-way or two-way analysis of variance (ANOVA) appropriate for the analysis of the reliability study? (b) Are differences between the judges’ mean ratings relevant to the reliability study? %%(c)%% Is the unit of analysis an individual rating or the mean of several ratings? (p. 420)

This results in the following six forms:

- If (a) one-way, then (b) is always no (consistency) and %%(c)%% can be either an individual rating (single) or mean of several ratings (average).
- If (a) two-way, then (b) may either be yes (agreement) or no (consistency) and %%(c)%% can be either an individual rating (single) or the mean of several ratings (average).

More specifically:

- Each target is rated by a different set of
kjudges, randomly selected from a larger population of judges (p. 421).

- If Case 1, then (a) is always one-way, (b) is always no
(consistency), but %%(c)%% may be either single or average.
- ICC(1,1): one-way, consistency, single
- ICC(1,4): one-way, consistency, average

- The one-way ANOVA model:

```
fit.1 <- aov(scores ~ targets, data = stj_df)
summary(fit.1)
Df Sum Sq Mean Sq F value Pr(>F)
targets 5 56.21 11.242 1.795 0.165
Residuals 18 112.75 6.264
```

The ICC(1,1) estimate (one-way, consistency, single):

$ICC(1,1) = \frac{BMS - WMS}{BMS + (k - 1)WMS}$

Where:

- BMS is the between-targets mean square (11.24)
- WMS is the within-target mean square (6.26)
**k**is the number of judges (4)

Therefore:

$\begin{align*} ICC(1,1) & = \frac{11.24 - 6.26}{11.24 + (4 - 1)6.26} \\ & = 0.17 \end{align*}$

The ICC(1,4) estimate (one-way, consistency, average):

$ICC(1,4) = \frac{BMS - WMS}{BMS}$

Therefore:

$\begin{align*} ICC(1,4) & = \frac{11.24 - 6.26}{11.24} \\ & = 0.44 \end{align*}$

- A random sample of
kjudges is selected from a larger population, and each judge rates each target, that is, each judge ratesntargets altogether (p 421).

- If Case 2, then (a) is always two-way, (b) is always yes
(agreement), and %%(c)%% may be either single or average.
- ICC(2,1): two-way, agreement, single
- ICC(2,4): two-way, agreement, average

- The two-way ANOVA model:

```
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
Df Sum Sq Mean Sq F value Pr(>F)
targets 5 56.21 11.24 11.03 0.000135 ***
judges 3 97.46 32.49 31.87 9.45e-07 ***
Residuals 15 15.29 1.02
```

The ICC(2,1) estimate (two-way, agreement, single):

$ICC(2,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS + \frac{k(JMS - EMS)}{n}}$

Where:

- BMS is the between-targets mean square (11.24)
- JMS is the mean square for the between-judges sum of squares (32.49)
- EMS is the mean square for the residual sum of squares (1.02)
**k**is the number of judges (4)**n**is the number of targets (6)

Therefore:

$\begin{align*} ICC(2,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02 + \frac{4(32.49 - 1.02)}{6}} \\ & = 0.29 \end{align*}$

The ICC(2,4) estimate (two-way, agreement, average):

$ICC(2,4) = \frac{BMS - EMS}{BMS + \frac{(JMS - EMS)}{n}}$

Therefore:

$\begin{align*} ICC(2,4) & = \frac{11.24 - 1.02}{11.24 + \frac{(32.49 - 1.02)}{6}} \\ & = 0.62 \end{align*}$

- Each target is rated by each of the same
kjudges, who are the only judges of interest (p. 421).

- If Case 3, then (a) is always two-way, (b) is always no
(consistency) and %%(c)%% may be either single or average.
- ICC(3,1): two-way, consistency, single
- ICC(3,4): two-way, consistency, average

- The two-way ANOVA model (same as for Case 2):

```
fit.2 <- aov(scores ~ targets + judges, data = stj_df)
summary(fit.2)
Df Sum Sq Mean Sq F value Pr(>F)
targets 5 56.21 11.24 11.03 0.000135 ***
judges 3 97.46 32.49 31.87 9.45e-07 ***
Residuals 15 15.29 1.02
```

The ICC(3,1) estimate (two-way, consistency, single):

$ICC(3,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS}$

Where:

- BMS is the between-targets mean square (11.24)
- EMS is the mean square for the residual sum of squares (1.02)
**k**is the number of judges (4)

Therefore:

$\begin{align*} ICC(3,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02} \\ & = 0.71 \end{align*}$

The ICC(3,4) estimate (two-way, consistency, average):

$ICC(3,4) = \frac{BMS - EMS}{BMS}$

Therefore:

$\begin{align*} ICC(3,4) & = \frac{11.24 - 1.02}{11.24} \\ & = 0.91 \end{align*}$

These agree with the ICC scores in Table 4 from Shrout & Fleiss (p. 424):

Version | Estimate | Model | Type | Unit of Analysis |
---|---|---|---|---|

ICC(1,1) | 0.17 | One-way | Consistency | Single |

ICC(1,4) | 0.44 | One-way | Consistency | Average |

ICC(2,1) | 0.29 | Two-way | Agreement | Single |

ICC(2,4) | 0.62 | Two-way | Agreement | Average |

ICC(3,1) | 0.71 | Two-way | Consistency | Single |

ICC(3,4) | 0.91 | Two-way | Consistency | Average |

Using the **irr** package, the data has to be reshaped
(here just re-added into R):

```
library("irr")
score_1 <- c(9,6,8,7,10,6)
score_2 <- c(2,1,4,1,5,2)
score_3 <- c(5,3,6,2,6,4)
score_4 <- c(8,2,8,6,9,7)
```

Viewing the data (**irr** uses the data as it appears in
the table at the top of this page):

```
cbind(score_1, score_2, score_3, score_4)
score_1 score_2 score_3 score_4
[1,] 9 2 5 8
[2,] 6 1 3 2
[3,] 8 4 6 8
[4,] 7 1 2 6
[5,] 10 5 6 9
[6,] 6 2 4 7
```

Then:

**ICC(1,1)** (one-way, consistency, single):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "oneway",
type = "consistency",
unit = "single")
Single Score Intraclass Correlation
Model: oneway
Type : consistency
Subjects = 6
Raters = 4
ICC(1) = 0.166
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,18) = 1.79 , p = 0.165
95%-Confidence Interval for ICC Population Values:
-0.133 < ICC < 0.723
```

**ICC(1,4)** (one-way, consistency, average):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "oneway",
type = "consistency",
unit = "average")
Average Score Intraclass Correlation
Model: oneway
Type : consistency
Subjects = 6
Raters = 4
ICC(4) = 0.443
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,18) = 1.79 , p = 0.165
95%-Confidence Interval for ICC Population Values:
-0.884 < ICC < 0.912
```

**ICC(2,1)** (two-way, agreement, single):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "twoway",
type = "agreement",
unit = "single")
Single Score Intraclass Correlation
Model: twoway
Type : agreement
Subjects = 6
Raters = 4
ICC(A,1) = 0.29
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,4.79) = 11 , p = 0.0113
95%-Confidence Interval for ICC Population Values:
0.019 < ICC < 0.761
```

**ICC(2,4)** (two-way, agreement, average):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "twoway",
type = "agreement",
unit = "average")
Average Score Intraclass Correlation
Model: twoway
Type : agreement
Subjects = 6
Raters = 4
ICC(A,4) = 0.62
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,4.19) = 11 , p = 0.0165
95%-Confidence Interval for ICC Population Values:
0.039 < ICC < 0.929
```

**ICC(3,1)** (two-way, consistency, single):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "twoway",
type = "consistency",
unit = "single")
Single Score Intraclass Correlation
Model: twoway
Type : consistency
Subjects = 6
Raters = 4
ICC(C,1) = 0.715
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,15) = 11 , p = 0.000135
95%-Confidence Interval for ICC Population Values:
0.342 < ICC < 0.946
```

**ICC(3,4)** (two-way, consistency, average):

```
icc(cbind(score_1, score_2, score_3, score_4),
model = "twoway",
type = "consistency",
unit = "average")
Average Score Intraclass Correlation
Model: twoway
Type : consistency
Subjects = 6
Raters = 4
ICC(C,4) = 0.909
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(5,15) = 11 , p = 0.000135
95%-Confidence Interval for ICC Population Values:
0.676 < ICC < 0.986
```