02-21-2024

Examples adapted from the following work:

Pagano, R. R. (2002). **Understanding statistics in the
behavioral sciences** (6th ed.). Belmont, CA: Wadsworth. See
pages 135 - 136.

$Y' = {b_Y}X + a_Y$

where:

- $Y'$ = predicted $Y$
- $b_Y$ = slope
- $a_Y$ = intercept

And:

$b_Y = \dfrac{\sum{XY} - \dfrac{(\sum{X})(\sum{Y})}{N}}{\sum{X^2} - \dfrac{(\sum{X})^2}{N}}$

And:

$a_Y = \bar{Y} - {b_Y}\bar{X}$

The goal is to predict height in inches at age 20 based on height at age 3.

Individual No. | Height at Age 3 X (in.) |
Height at Age 20 Y (in.) |
---|---|---|

1 | 30 | 59 |

2 | 30 | 63 |

3 | 32 | 62 |

4 | 33 | 67 |

5 | 34 | 65 |

6 | 35 | 61 |

7 | 36 | 69 |

8 | 38 | 66 |

9 | 40 | 68 |

10 | 41 | 65 |

11 | 41 | 73 |

12 | 43 | 68 |

13 | 45 | 71 |

14 | 45 | 74 |

15 | 47 | 71 |

16 | 48 | 75 |

Enter the data into R:

```
x <- c(30, 30, 32, 33, 34, 35, 36, 38, 40, 41, 41, 43, 45, 45, 47, 48)
y <- c(59, 63, 62, 67, 65, 61, 69, 66, 68, 65, 73, 68, 71, 74, 71, 75)
```

Run the model using https://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html

```
fit.1 <- lm(y ~ x)
summary(fit.1)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-3.9068 -1.9569 -0.3841 1.7136 4.1113
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.6792 4.4698 9.325 2.21e-07 ***
x 0.6636 0.1144 5.799 4.61e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.654 on 14 degrees of freedom
Multiple R-squared: 0.7061, Adjusted R-squared: 0.6851
F-statistic: 33.63 on 1 and 14 DF, p-value: 4.611e-05
```

Use the coefficients to create the regression equation: $Y' = {b_Y}X + a_Y = 0.664X + 41.679$

Visualize the regression with a shaded 95% confidence region:

```
library(ggplot2)
dat <- data.frame(x,y)
p <- ggplot(dat, aes(x, y))
p + geom_point() + geom_smooth(method = lm)
```

`summary(fit.1)`

reports the p-value for the F-statistic.
Another way to test the regression of Y on X is by comparing the
F-statistic observed to its critical value in a http://www.socr.ucla.edu/applets.dir/f_table.html|F-Distribution
table. If for no other reason but to help develop the intuition
involved.) Although the F-statistic is reported by
`summary(fit.1)`

, per Pedhazur (1997), it can also be derived
by dividing the **regression sums of squares** by the
associated **degrees of freedom** and then by the
**residual sums of squares** by its associated
**degrees of freedom**. The **sums of
squares** are not reported by `summary(fit.1)`

, but
they are reported by fitting **anova** to the model:

```
aov(fit.1)
Call:
aov(formula = fit.1)
Terms:
x Residuals
Sum of Squares 236.83824 98.59926
Deg. of Freedom 1 14
Residual standard error: 2.653828
Estimated effects may be unbalanced
```

Use these values to confirm the F-statistic:

$F = \dfrac{\dfrac{SS_{reg}}{df_1}}{\dfrac{SS_{res}}{df_2}} = \dfrac{\dfrac{236.83824}{1}}{\dfrac{98.59926}{14}} = 33.63$