r:chi-square

# Chi-Square Examples

Examples adapted from the following work:

Pagano, R. R. (2002). Understanding statistics in the behavioral sciences (6th ed.). Belmont, CA: Wadsworth.

There are two catches to be aware of when running chi-square tests in R:

1. Don't assume the vector of probabilities to be equal. See example below.
2. Always check the degrees of freedom are correct for the test. If doing a chisq.test on a table of data, if the table isn't correct, the chisq.test may look at the wrong relationships.
1. For a Goodness-of-Fit Test, the $df = (k - 1)$.
2. For a Test of Independence, the $df = (r - 1)(c -1)$

### Chi-Square Single Variable (Goodness of Fit)

Based on pages 422-423 of the above book.

A researcher believes the ethnic populations of a city has changed since data was last taken. When data was last taken, the breakdown was:

• 53% Norwegian
• 32% Italian
• 8% Irish
• 5% Hispanic
• 2% Italian

New data is collected from a random sample of 750 inhabitants of the city. The results are:

Norwegian Swedish Irish Hispanic Italian
399 193 63 82 13

Use the chisq.test and test the observed frequencies (the new count) against the percentages of the previous data (the null probabilities).

$H_0:$ The ethnic population has not changed in composition.

new_count  <- c(399, 193, 63, 82, 13)
null_probs <- c(0.53, 0.32, 0.08, 0.05, 0.02)
chisq.test(new_count, p = null_probs)

The result: $X^2 = 62.433, df = 4, p < 0.001$. Since $p < 0.05$, then we reject $H_0$.

#### Example 2

The above is a good example of when to adjust the expected probabilities and how to in R. By default, the chisq.test assumes that probabilities are equal across all categories of observations. Thus, assume the following scenario (Mendenhall, Beaver, & Beaver, p. 597, 2006):

 Door Green Red Blue Observed Count 20 39 31

Without prior knowledge, the default null hypothesis is:

$H_0: p_1 = p_2 = p_3 = \frac{1}{3}$

R assumes the default $H_0$ too, and the code would be:

count <- c(20, 39, 31)
chisq.test(count)

The result: $X^2 = 6.0667, df = 2, p < 0.04815$. Since $p < 0.05$, then we reject $H_0$.

We can also be explicit about the vector of probabilities, as we were in the example above, and even if those probabilities are equal. The results for the following are the same:

chisq.test(count, p = c(1/3, 1/3, 1/3))

Alternate way to write the above:

chisq.test(count, p = rep(1/3, 3))

### Test of independence between two variables

Based on page 424 of the above book. Here the table has to be built correctly in R for the analysis to work.

$H_0:$ Political affiliation and attitude toward some bill are independent.

 Attitude For Undecided Against 68 22 110 200 92 18 90 200 160 40 200 400

First, let's build the table:

party <- as.table(rbind(c(68, 22, 110), c(92, 18, 90)))
dimnames(party) <- list(affiliation = c("Republican", "Democrat"),
attitude = c("For", "Undecided", "Against"))

Let's examine the table:

party
attitude
affiliation   For Undecided Against
Republican  68        22     110
Democrat    92        18      90

And a chisq.test(party) shows that:

$X^2 = 6, df = 2, p < 0.04979$. Since $p < 0.05$, then we reject $H_0$.

#### Example 2

From Mendenhall, Beaver, & Beaver (p. 602), we have the following contingency table:

 Shift Type of Defects 1 2 3 1 15 26 33 74 2 21 31 17 69 3 45 34 49 128 4 13 5 20 38 94 96 119 309

$H_0:$ Type of Defects and Shift are independent.

We can build the table in R:

defects <- as.table(rbind(c(15, 26, 33),
c(21, 31, 17),
c(45, 34, 49),
c(13, 5, 20)))
dimnames(defects) <- list(type = c("A", "B", "C", "D"),
shift = c("1", "2", "3"))

And the chisq.test results in:

$X^2 = 19.178, df = 6, p < 0.003873$. Since $p < 0.05$, then we reject $H_0$.

### References

Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). Introduction to Probability and Statistics (12th ed.). Australia: Thomson Books/Cole.