r:intraclass-correlation

Working through the **intraclass correlation coefficients (ICC)** by reading:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. *Psychological Bulletin, 86*(2), 420-428. http://dx.doi.org/10.1037/0033-2909.86.2.420 http://www.ncbi.nlm.nih.gov/pubmed/18839484

And using the *irr* package along with its documentation:

Gamer, Matthias. Lemon, Jim, Fellows, Ian, & Singh, Puspendra. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84. http://CRAN.R-project.org/package=irr

The toy data from the Shrout & Fleiss article, Table 2, p. 423:

Judge | ||||
---|---|---|---|---|

Target | 1 | 2 | 3 | 4 |

1 | 9 | 2 | 5 | 8 |

2 | 6 | 1 | 3 | 2 |

3 | 8 | 4 | 6 | 8 |

4 | 7 | 1 | 2 | 6 |

5 | 10 | 5 | 6 | 9 |

6 | 6 | 2 | 4 | 7 |

Add the above data to R and structure for running an Anova:

scores <- c(9,6,8,7,10,6,2,1,4,1,5,2,5,3,6,2,6,4,8,2,8,6,9,7) targets <- rep(c("target1", "target2", "target3", "target4", "target5", "target6"), 4) judges <- c(rep("judge1", 6), rep("judge2", 6), rep("judge3", 6), rep("judge4", 6)) stj_df <- data.frame(scores, targets, judges)

Resulting data frame:

scores | judges | targets | |
---|---|---|---|

1 | 9 | judge1 | target1 |

2 | 6 | judge1 | target2 |

3 | 8 | judge1 | target3 |

4 | 7 | judge1 | target4 |

5 | 10 | judge1 | target5 |

6 | 6 | judge1 | target6 |

7 | 2 | judge2 | target1 |

8 | 1 | judge2 | target2 |

9 | 4 | judge2 | target3 |

10 | 1 | judge2 | target4 |

11 | 5 | judge2 | target5 |

12 | 2 | judge2 | target6 |

13 | 5 | judge3 | target1 |

14 | 3 | judge3 | target2 |

15 | 6 | judge3 | target3 |

16 | 2 | judge3 | target4 |

17 | 6 | judge3 | target5 |

18 | 4 | judge3 | target6 |

19 | 8 | judge4 | target1 |

20 | 2 | judge4 | target2 |

21 | 8 | judge4 | target3 |

22 | 6 | judge4 | target4 |

23 | 9 | judge4 | target5 |

24 | 7 | judge4 | target6 |

Relevant summary of statistics:

Group | N | Mean | Var |
---|---|---|---|

Target 1 | 4 | 6.00 | 10.00 |

Target 2 | 4 | 3.00 | 4.67 |

Target 3 | 4 | 6.50 | 3.67 |

Target 4 | 4 | 4.00 | 8.67 |

Target 5 | 4 | 7.50 | 5.67 |

Target 6 | 4 | 4.75 | 4.92 |

Total | 24 | 5.29 | 7.35 |

Group | N | Mean | Var |
---|---|---|---|

Judge 1 | 6 | 7.67 | 2.67 |

Judge 2 | 6 | 2.50 | 2.70 |

Judge 3 | 6 | 4.33 | 2.67 |

Judge 4 | 6 | 6.67 | 6.27 |

Total | 24 | 5.29 | 7.35 |

Shrout & Fleiss document six versions of the intraclass correlation coefficient (*ICC*). In deciding which version to use, they state:

The guidelines for choosing the appropriate form of theICCcall for three decisions: (a) Is a one-way or two-way analysis of variance (ANOVA) appropriate for the analysis of the reliability study? (b) Are differences between the judges' mean ratings relevant to the reliability study? (c) Is the unit of analysis an individual rating or the mean of several ratings? (p. 420)

This results in the following six forms:

- If (a) one-way, then (b) is always no (consistency) and (c) can be either an individual rating (single) or mean of several ratings (average).
- If (a) two-way, then (b) may either be yes (agreement) or no (consistency) and (c) can be either an individual rating (single) or the mean of several ratings (average).

More specifically:

1. Each target is rated by a different set ofkjudges, randomly selected from a larger population of judges (p. 421).

- If Case 1, then (a) is always one-way, (b) is always no (consistency), but (c) may be either single or average.
- ICC(1,1): one-way, consistency, single
- ICC(1,4): one-way, consistency, average

- The one-way ANOVA model:

fit.1 <- aov(scores ~ targets, data = stj_df) summary(fit.1) Df Sum Sq Mean Sq F value Pr(>F) targets 5 56.21 11.242 1.795 0.165 Residuals 18 112.75 6.264

The ICC(1,1) estimate (one-way, consistency, single):

$$ ICC(1,1) = \frac{BMS - WMS}{BMS + (k - 1)WMS} $$

Where:

- BMS is the between-targets mean square (11.24)
- WMS is the within-target mean square (6.26)
*k*is the number of judges (4)

Therefore:

\begin{align*} ICC(1,1) & = \frac{11.24 - 6.26}{11.24 + (4 - 1)6.26} \\ & = 0.17 \end{align*}

The ICC(1,4) estimate (one-way, consistency, average):

$$ ICC(1,4) = \frac{BMS - WMS}{BMS} $$

Therefore:

\begin{align*} ICC(1,4) & = \frac{11.24 - 6.26}{11.24} \\ & = 0.44 \end{align*}

2. A random sample ofkjudges is selected from a larger population, and each judge rates each target, that is, each judge ratesntargets altogether (p 421).

- If Case 2, then (a) is always two-way, (b) is always yes (agreement), and (c) may be either single or average.
- ICC(2,1): two-way, agreement, single
- ICC(2,4): two-way, agreement, average

- The two-way ANOVA model:

fit.2 <- aov(scores ~ targets + judges, data = stj_df) summary(fit.2) Df Sum Sq Mean Sq F value Pr(>F) targets 5 56.21 11.24 11.03 0.000135 *** judges 3 97.46 32.49 31.87 9.45e-07 *** Residuals 15 15.29 1.02

The ICC(2,1) estimate (two-way, agreement, single):

$$ ICC(2,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS + \frac{k(JMS - EMS)}{n}} $$

Where:

- BMS is the between-targets mean square (11.24)
- JMS is the mean square for the between-judges sum of squares (32.49)
- EMS is the mean square for the residual sum of squares (1.02)
*k*is the number of judges (4)*n*is the number of targets (6)

Therefore:

\begin{align*} ICC(2,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02 + \frac{4(32.49 - 1.02)}{6}} \\ & = 0.29 \end{align*}

The ICC(2,4) estimate (two-way, agreement, average):

$$ ICC(2,4) = \frac{BMS - EMS}{BMS + \frac{(JMS - EMS)}{n}} $$

Therefore:

\begin{align*} ICC(2,4) & = \frac{11.24 - 1.02}{11.24 + \frac{(32.49 - 1.02)}{6}} \\ & = 0.62 \end{align*}

3. Each target is rated by each of the samekjudges, who are the only judges of interest (p. 421).

- If Case 3, then (a) is always two-way, (b) is always no (consistency) and (c) may be either single or average.
- ICC(3,1): two-way, consistency, single
- ICC(3,4): two-way, consistency, average

- The two-way ANOVA model (same as for Case 2):

fit.2 <- aov(scores ~ targets + judges, data = stj_df) summary(fit.2) Df Sum Sq Mean Sq F value Pr(>F) targets 5 56.21 11.24 11.03 0.000135 *** judges 3 97.46 32.49 31.87 9.45e-07 *** Residuals 15 15.29 1.02

The ICC(3,1) estimate (two-way, consistency, single):

$$ ICC(3,1) = \frac{BMS - EMS}{BMS + (k - 1)EMS} $$

Where:

- BMS is the between-targets mean square (11.24)
- EMS is the mean square for the residual sum of squares (1.02)
*k*is the number of judges (4)

Therefore:

\begin{align*} ICC(3,1) & = \frac{11.24 - 1.02}{11.24 + (4 - 1)1.02} \\ & = 0.71 \end{align*}

The ICC(3,4) estimate (two-way, consistency, average):

$$ ICC(3,4) = \frac{BMS - EMS}{BMS} $$

Therefore:

\begin{align*} ICC(3,4) & = \frac{11.24 - 1.02}{11.24} \\ & = 0.91 \end{align*}

These agree with the ICC scores in Table 4 from Shrout & Fleiss (p. 424):

Version | Estimate | Model | Type | Unit of Analysis |
---|---|---|---|---|

ICC(1,1) | 0.17 | One-way | Consistency | Single |

ICC(1,4) | 0.44 | One-way | Consistency | Average |

ICC(2,1) | 0.29 | Two-way | Agreement | Single |

ICC(2,4) | 0.62 | Two-way | Agreement | Average |

ICC(3,1) | 0.71 | Two-way | Consistency | Single |

ICC(3,4) | 0.91 | Two-way | Consistency | Average |

Using the *irr* package, the data has to be reshaped (here just re-added into R):

library("irr") score_1 <- c(9,6,8,7,10,6) score_2 <- c(2,1,4,1,5,2) score_3 <- c(5,3,6,2,6,4) score_4 <- c(8,2,8,6,9,7)

Viewing the data (*irr* uses the data as it appears in the table at the top of this page):

cbind(score_1, score_2, score_3, score_4) score_1 score_2 score_3 score_4 [1,] 9 2 5 8 [2,] 6 1 3 2 [3,] 8 4 6 8 [4,] 7 1 2 6 [5,] 10 5 6 9 [6,] 6 2 4 7

Then:

**ICC(1,1)** (one-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4), model = "oneway", type = "consistency", unit = "single") Single Score Intraclass Correlation Model: oneway Type : consistency Subjects = 6 Raters = 4 ICC(1) = 0.166 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,18) = 1.79 , p = 0.165 95%-Confidence Interval for ICC Population Values: -0.133 < ICC < 0.723

**ICC(1,4)** (one-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4), model = "oneway", type = "consistency", unit = "average") Average Score Intraclass Correlation Model: oneway Type : consistency Subjects = 6 Raters = 4 ICC(4) = 0.443 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,18) = 1.79 , p = 0.165 95%-Confidence Interval for ICC Population Values: -0.884 < ICC < 0.912

**ICC(2,1)** (two-way, agreement, single):

icc(cbind(score_1, score_2, score_3, score_4), model = "twoway", type = "agreement", unit = "single") Single Score Intraclass Correlation Model: twoway Type : agreement Subjects = 6 Raters = 4 ICC(A,1) = 0.29 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,4.79) = 11 , p = 0.0113 95%-Confidence Interval for ICC Population Values: 0.019 < ICC < 0.761

**ICC(2,4)** (two-way, agreement, average):

icc(cbind(score_1, score_2, score_3, score_4), model = "twoway", type = "agreement", unit = "average") Average Score Intraclass Correlation Model: twoway Type : agreement Subjects = 6 Raters = 4 ICC(A,4) = 0.62 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,4.19) = 11 , p = 0.0165 95%-Confidence Interval for ICC Population Values: 0.039 < ICC < 0.929

**ICC(3,1)** (two-way, consistency, single):

icc(cbind(score_1, score_2, score_3, score_4), model = "twoway", type = "consistency", unit = "single") Single Score Intraclass Correlation Model: twoway Type : consistency Subjects = 6 Raters = 4 ICC(C,1) = 0.715 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,15) = 11 , p = 0.000135 95%-Confidence Interval for ICC Population Values: 0.342 < ICC < 0.946

**ICC(3,4)** (two-way, consistency, average):

icc(cbind(score_1, score_2, score_3, score_4), model = "twoway", type = "consistency", unit = "average") Average Score Intraclass Correlation Model: twoway Type : consistency Subjects = 6 Raters = 4 ICC(C,4) = 0.909 F-Test, H0: r0 = 0 ; H1: r0 > 0 F(5,15) = 11 , p = 0.000135 95%-Confidence Interval for ICC Population Values: 0.676 < ICC < 0.986

r/intraclass-correlation.txt · Last modified: 2015/12/09 11:56 by seanburns