Chi-Square Examples in R

02-20-2024

Back to ~/csb

Chi-Square Examples

Examples adapted from the following work:

Pagano, R. R. (2002). Understanding statistics in the behavioral sciences (6th ed.). Belmont, CA: Wadsworth.

There are two catches to be aware of when running chi-square tests in R:

• Don’t assume the vector of probabilities to be equal. See example below.
• Always check the degrees of freedom are correct for the test. If doing a ‘’chisq.test’’ on a table of data, if the table isn’t correct, the ‘’chisq.test’’ may look at the wrong relationships.
• For a Goodness-of-Fit Test, the $df = (k - 1)$.
• For a Test of Independence, the $df = (r - 1)(c -1)$

Chi-Square Single Variable (Goodness of Fit)

Based on pages 422-423 of the above book.

A researcher believes the ethnic populations of a city has changed since data was last taken. When data was last taken, the breakdown was:

• 53% Norwegian
• 32% Italian
• 8% Irish
• 5% Hispanic
• 2% Italian

New data is collected from a random sample of 750 inhabitants of the city. The results are:

Norwegian Swedish Irish Hispanic Italian
399 193 63 82 13

Use the ‘’chisq.test’’ and test the observed frequencies (the new count) against the percentages of the previous data (the null probabilities).

$H_0:$ The ethnic population has not changed in composition.

new_count  <- c(399, 193, 63, 82, 13)
null_probs <- c(0.53, 0.32, 0.08, 0.05, 0.02)
chisq.test(new_count, p = null_probs)

The result: $X^2 = 62.433, df = 4, p < 0.001$.
Since $p < 0.05$, then we reject $H_0$.

Example 2

The above is a good example of when to adjust the expected probabilities and how to in R. By default, the ‘’chisq.test’’ assumes that probabilities are equal across all categories of observations. Thus, assume the following scenario (Mendenhall, Beaver, & Beaver, p. 597, 2006):

|----------------|--------------------|
|                | Door               |
|----------------|--------------------|
|                | Green | Red | Blue |
|----------------|--------------------|
| Observed Count | 20    | 39  | 31   |
|----------------|--------------------|

Without prior knowledge, the default null hypothesis is:

$H_0: p_1 = p_2 = p_3 = \frac{1}{3}$

R assumes the default $H_0$ too, and the code would be:

count <- c(20, 39, 31)
chisq.test(count)

The result: $X^2 = 6.0667, df = 2, p < 0.04815$.
Since $p < 0.05$, then we reject $H_0$.

We can also be explicit about the vector of probabilities, as we were in the example above, and even if those probabilities are equal. The results for the following are the same:

chisq.test(count, p = c(1/3, 1/3, 1/3))

Alternate way to write the above:

chisq.test(count, p = rep(1/3, 3))

Test of independence between two variables

Based on page 424 of the above book. Here the table has to be built correctly in R for the analysis to work.

$H_0:$ Political affiliation and attitude toward some bill are independent.

|-----------------|------------------------------------------|
|                 | Attitude                                 |
|-----------------|------------------------------------------|
|                 | For | Undecided | Against | Row Marginal |
| Republican      | 68  | 22        | 110     | 200          |
| Democrat        | 92  | 18        | 90      | 200          |
| Column Marginal | 160 | 40        | 200     | 400          |
|-----------------|------------------------------------------|

First, let’s build the table:

party <- as.table(rbind(c(68, 22, 110), c(92, 18, 90)))
dimnames(party) <- list(affiliation = c("Republican", "Democrat"),
attitude = c("For", "Undecided", "Against"))

Let’s examine the table:

party
attitude
affiliation   For Undecided Against
Republican  68        22     110
Democrat    92        18      90

And a ‘’chisq.test(party)’’ shows that:

$X^2 = 6, df = 2, p < 0.04979$.
Since $p < 0.05$, then we reject $H_0$.

=== Example 2 ===

From Mendenhall, Beaver, & Beaver (p. 602), we have the following contingency table:

|-----------------|------------------------|
|                 | Shift                  |
|-----------------|------------------------|
| Type of Defects | 1  | 2  | 3   | Total  |
| 1               | 15 | 26 | 33  | 74     |
| 2               | 21 | 31 | 17  | 69     |
| 3               | 45 | 34 | 49  | 128    |
| 4               | 13 | 5  | 20  | 38     |
|-----------------|------------------------|
| Total           | 94 | 96 | 119 | 309    |
|-----------------|------------------------|

$H_0:$ Type of Defects and Shift are independent.

We can build the table in R:

defects <- as.table(rbind(c(15, 26, 33),
c(21, 31, 17),
c(45, 34, 49),
c(13, 5, 20)))
dimnames(defects) <- list(type = c("A", "B", "C", "D"),
shift = c("1", "2", "3"))

And the ‘’chisq.test’’ results in:

$X^2 = 19.178, df = 6, p < 0.003873$.
Since $p < 0.05$, then we reject $H_0$.

References

Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). Introduction to Probability and Statistics (12th ed.). Australia: Thomson Books/Cole.