Chi-Square Examples in R

C. Sean Burns

02-20-2024

Back to ~/csb

Chi-Square Examples

Examples adapted from the following work:

Pagano, R. R. (2002). Understanding statistics in the behavioral sciences (6th ed.). Belmont, CA: Wadsworth.

There are two catches to be aware of when running chi-square tests in R:

Chi-Square Single Variable (Goodness of Fit)

Based on pages 422-423 of the above book.

A researcher believes the ethnic populations of a city has changed since data was last taken. When data was last taken, the breakdown was:

New data is collected from a random sample of 750 inhabitants of the city. The results are:

Norwegian Swedish Irish Hispanic Italian
399 193 63 82 13

Use the ‘’chisq.test’’ and test the observed frequencies (the new count) against the percentages of the previous data (the null probabilities).

H0:H_0: The ethnic population has not changed in composition.

new_count  <- c(399, 193, 63, 82, 13)
null_probs <- c(0.53, 0.32, 0.08, 0.05, 0.02)
chisq.test(new_count, p = null_probs)

The result: X2=62.433,df=4,p<0.001X^2 = 62.433, df = 4, p < 0.001.
Since p<0.05p < 0.05, then we reject H0H_0.

Example 2

The above is a good example of when to adjust the expected probabilities and how to in R. By default, the ‘’chisq.test’’ assumes that probabilities are equal across all categories of observations. Thus, assume the following scenario (Mendenhall, Beaver, & Beaver, p. 597, 2006):

|----------------|--------------------|
|                | Door               |
|----------------|--------------------|
|                | Green | Red | Blue |
|----------------|--------------------|
| Observed Count | 20    | 39  | 31   |
|----------------|--------------------|

Without prior knowledge, the default null hypothesis is:

H0:p1=p2=p3=13H_0: p_1 = p_2 = p_3 = \frac{1}{3}

R assumes the default H0H_0 too, and the code would be:

count <- c(20, 39, 31)
chisq.test(count)

The result: X2=6.0667,df=2,p<0.04815X^2 = 6.0667, df = 2, p < 0.04815.
Since p<0.05p < 0.05, then we reject H0H_0.

We can also be explicit about the vector of probabilities, as we were in the example above, and even if those probabilities are equal. The results for the following are the same:

chisq.test(count, p = c(1/3, 1/3, 1/3))

Alternate way to write the above:

chisq.test(count, p = rep(1/3, 3))

Test of independence between two variables

Based on page 424 of the above book. Here the table has to be built correctly in R for the analysis to work.

H0:H_0: Political affiliation and attitude toward some bill are independent.

|-----------------|------------------------------------------|
|                 | Attitude                                 |
|-----------------|------------------------------------------|
|                 | For | Undecided | Against | Row Marginal |
| Republican      | 68  | 22        | 110     | 200          |
| Democrat        | 92  | 18        | 90      | 200          |
| Column Marginal | 160 | 40        | 200     | 400          |
|-----------------|------------------------------------------|

First, let’s build the table:

party <- as.table(rbind(c(68, 22, 110), c(92, 18, 90)))
dimnames(party) <- list(affiliation = c("Republican", "Democrat"),
                        attitude = c("For", "Undecided", "Against"))

Let’s examine the table:

party
            attitude
affiliation   For Undecided Against
  Republican  68        22     110
  Democrat    92        18      90

And a ‘’chisq.test(party)’’ shows that:

X2=6,df=2,p<0.04979X^2 = 6, df = 2, p < 0.04979.
Since p<0.05p < 0.05, then we reject H0H_0.

=== Example 2 ===

From Mendenhall, Beaver, & Beaver (p. 602), we have the following contingency table:

|-----------------|------------------------|
|                 | Shift                  |
|-----------------|------------------------|
| Type of Defects | 1  | 2  | 3   | Total  |
| 1               | 15 | 26 | 33  | 74     |
| 2               | 21 | 31 | 17  | 69     |
| 3               | 45 | 34 | 49  | 128    |
| 4               | 13 | 5  | 20  | 38     |
|-----------------|------------------------|
| Total           | 94 | 96 | 119 | 309    |
|-----------------|------------------------|

H0:H_0: Type of Defects and Shift are independent.

We can build the table in R:

defects <- as.table(rbind(c(15, 26, 33),
                          c(21, 31, 17),
                          c(45, 34, 49),
                          c(13, 5, 20)))
dimnames(defects) <- list(type = c("A", "B", "C", "D"),
                          shift = c("1", "2", "3"))

And the ‘’chisq.test’’ results in:

X2=19.178,df=6,p<0.003873X^2 = 19.178, df = 6, p < 0.003873.
Since p<0.05p < 0.05, then we reject H0H_0.

References

Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). Introduction to Probability and Statistics (12th ed.). Australia: Thomson Books/Cole.