r:chi-square

Examples adapted from the following work:

Pagano, R. R. (2002). *Understanding statistics in the behavioral sciences* (6th ed.). Belmont, CA: Wadsworth.

There are two catches to be aware of when running chi-square tests in R:

- Don't assume the vector of probabilities to be equal. See example below.
- Always check the degrees of freedom are correct for the test. If doing a
`chisq.test`

on a table of data, if the table isn't correct, the`chisq.test`

may look at the wrong relationships.- For a
*Goodness-of-Fit Test*, the $df = (k - 1)$. - For a
*Test of Independence*, the $df = (r - 1)(c -1)$

Based on pages 422-423 of the above book.

A researcher believes the ethnic populations of a city has changed since data was last taken. When data was last taken, the breakdown was:

- 53% Norwegian
- 32% Italian
- 8% Irish
- 5% Hispanic
- 2% Italian

New data is collected from a random sample of 750 inhabitants of the city. The results are:

Norwegian | Swedish | Irish | Hispanic | Italian |
---|---|---|---|---|

399 | 193 | 63 | 82 | 13 |

Use the `chisq.test`

and test the observed frequencies (the new count) against the percentages of the previous data (the null probabilities).

$H_0:$ The ethnic population has not changed in composition.

new_count <- c(399, 193, 63, 82, 13) null_probs <- c(0.53, 0.32, 0.08, 0.05, 0.02) chisq.test(new_count, p = null_probs)

The result: $X^2 = 62.433, df = 4, p < 0.001$. Since $p < 0.05$, then we reject $H_0$.

The above is a good example of when to adjust the expected probabilities and how to in R. By default, the `chisq.test`

assumes that probabilities are equal across all categories of observations. Thus, assume the following scenario (Mendenhall, Beaver, & Beaver, p. 597, 2006):

Door | |||
---|---|---|---|

Green | Red | Blue | |

Observed Count | 20 | 39 | 31 |

Without prior knowledge, the default null hypothesis is:

$H_0: p_1 = p_2 = p_3 = \frac{1}{3}$

R assumes the default $H_0$ too, and the code would be:

count <- c(20, 39, 31) chisq.test(count)

The result: $X^2 = 6.0667, df = 2, p < 0.04815$. Since $p < 0.05$, then we reject $H_0$.

We can also be explicit about the vector of probabilities, as we were in the example above, and even if those probabilities are equal. The results for the following are the same:

chisq.test(count, p = c(1/3, 1/3, 1/3))

Alternate way to write the above:

chisq.test(count, p = rep(1/3, 3))

Based on page 424 of the above book. Here the table has to be built correctly in R for the analysis to work.

$H_0:$ Political affiliation and attitude toward some bill are independent.

Attitude | ||||
---|---|---|---|---|

For | Undecided | Against | Row Marginal | |

Republican | 68 | 22 | 110 | 200 |

Democrat | 92 | 18 | 90 | 200 |

Column Marginal | 160 | 40 | 200 | 400 |

First, let's build the table:

party <- as.table(rbind(c(68, 22, 110), c(92, 18, 90))) dimnames(party) <- list(affiliation = c("Republican", "Democrat"), attitude = c("For", "Undecided", "Against"))

Let's examine the table:

party attitude affiliation For Undecided Against Republican 68 22 110 Democrat 92 18 90

And a `chisq.test(party)`

shows that:

$X^2 = 6, df = 2, p < 0.04979$. Since $p < 0.05$, then we reject $H_0$.

From Mendenhall, Beaver, & Beaver (p. 602), we have the following contingency table:

Shift | ||||
---|---|---|---|---|

Type of Defects | 1 | 2 | 3 | Total |

1 | 15 | 26 | 33 | 74 |

2 | 21 | 31 | 17 | 69 |

3 | 45 | 34 | 49 | 128 |

4 | 13 | 5 | 20 | 38 |

Total | 94 | 96 | 119 | 309 |

$H_0:$ Type of Defects and Shift are independent.

We can build the table in R:

defects <- as.table(rbind(c(15, 26, 33), c(21, 31, 17), c(45, 34, 49), c(13, 5, 20))) dimnames(defects) <- list(type = c("A", "B", "C", "D"), shift = c("1", "2", "3"))

And the `chisq.test`

results in:

$X^2 = 19.178, df = 6, p < 0.003873$. Since $p < 0.05$, then we reject $H_0$.

Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). *Introduction to Probability and Statistics* (12th ed.). Australia: Thomson Books/Cole.

r/chi-square.txt · Last modified: 2017/02/22 10:51 by seanburns