02-21-2024
Examples adapted from the following work:
Pagano, R. R. (2002). Understanding statistics in the behavioral sciences (6th ed.). Belmont, CA: Wadsworth. See pages 135 - 136.
where:
And:
And:
The goal is to predict height in inches at age 20 based on height at age 3.
Individual No. | Height at Age 3 X (in.) | Height at Age 20 Y (in.) |
---|---|---|
1 | 30 | 59 |
2 | 30 | 63 |
3 | 32 | 62 |
4 | 33 | 67 |
5 | 34 | 65 |
6 | 35 | 61 |
7 | 36 | 69 |
8 | 38 | 66 |
9 | 40 | 68 |
10 | 41 | 65 |
11 | 41 | 73 |
12 | 43 | 68 |
13 | 45 | 71 |
14 | 45 | 74 |
15 | 47 | 71 |
16 | 48 | 75 |
Enter the data into R:
x <- c(30, 30, 32, 33, 34, 35, 36, 38, 40, 41, 41, 43, 45, 45, 47, 48)
y <- c(59, 63, 62, 67, 65, 61, 69, 66, 68, 65, 73, 68, 71, 74, 71, 75)
Run the model using https://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html
fit.1 <- lm(y ~ x)
summary(fit.1)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-3.9068 -1.9569 -0.3841 1.7136 4.1113
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.6792 4.4698 9.325 2.21e-07 ***
x 0.6636 0.1144 5.799 4.61e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.654 on 14 degrees of freedom
Multiple R-squared: 0.7061, Adjusted R-squared: 0.6851
F-statistic: 33.63 on 1 and 14 DF, p-value: 4.611e-05
Use the coefficients to create the regression equation:
Visualize the regression with a shaded 95% confidence region:
library(ggplot2)
dat <- data.frame(x,y)
p <- ggplot(dat, aes(x, y))
p + geom_point() + geom_smooth(method = lm)
summary(fit.1)
reports the p-value for the F-statistic.
Another way to test the regression of Y on X is by comparing the
F-statistic observed to its critical value in a http://www.socr.ucla.edu/applets.dir/f_table.html|F-Distribution
table. If for no other reason but to help develop the intuition
involved.) Although the F-statistic is reported by
summary(fit.1)
, per Pedhazur (1997), it can also be derived
by dividing the regression sums of squares by the
associated degrees of freedom and then by the
residual sums of squares by its associated
degrees of freedom. The sums of
squares are not reported by summary(fit.1)
, but
they are reported by fitting anova to the model:
aov(fit.1)
Call:
aov(formula = fit.1)
Terms:
x Residuals
Sum of Squares 236.83824 98.59926
Deg. of Freedom 1 14
Residual standard error: 2.653828
Estimated effects may be unbalanced
Use these values to confirm the F-statistic: