02-20-2024
In this tutorial, I primarily cover base-R plot functions.
I used different data, but the examples here are built on those provided by STHDA.
The plot()
function is the most basic plot and is often
used to plot line or point plots. The syntax is plot(x,y)
but also includes arguments for modifying the plot. Examples:
Basic Plot
ds <- USArrests # load the built-in dataset
head(ds) # look at the top few rows
plot(ds$Murder, ds$UrbanPop, type = "p")
Data example 2:
library(robustbase)
edex <- education
head(education)
Adding Labels
Variables include (from ?education
):
Since the variables do not have descriptive names, we can add labels to the X and Y axes:
plot(education$X1, education$Y,
xlab = "Residents per 1000 in Urban Areas",
ylab = "Per capita Expenditure on Public Education",
main = "Relationship Between Expenditure on Public Ed and Urban Pop")
Adding Regression Line
plot(education$X1, education$Y,
xlab = "Residents per 1000 in Urban Areas",
ylab = "Per capita Expenditure on Public Education",
main = "Relationship Between Expenditure on Public Ed and Urban Pop")
abline(lm(education$Y ~ education$X1), col = "blue")
Adding Loess Line
plot(education$X1, education$Y,
xlab = "Residents per 1000 in Urban Areas",
ylab = "Per capita Expenditure on Public Education",
main = "Relationship Between Expenditure on Public Ed and Urban Pop")
lines(lowess(education$Y ~ education$X1), col = "red")
Scatterplots with Groups
library("car")
scatterplot(education$Y ~ education$X1 | education$Region)
Making the scatterplot more readable by describing the regions variable:
education$RegionName <- factor(education$Region,
labels = c("Northeastern", "North Central", "Southern", "Western"))
scatterplot(education$Y ~ education$X1 | education$RegionName)
Scatterplot Matrices
Here’s we’ll compare the four numerical variables by specifying their
column numbers. We can define the kind of points the plot creates with
the pch
argument. See ?points
for options.
pairs(education[,3:6], pch = 20)
pairs(education[,3:6], pch = 18, col = "blue")
pairs(education[,3:6], pch = 18, col = "red", cex = 1.8)
boxplot(education$Y ~ education$Region)
boxplot(education$Y ~ education$RegionName)
boxplot(education$Y ~ education$RegionName,
col = c("red", "blue", "green", "orange"))
stripchart(education$Y ~ education$RegionName)
stripchart(education$Y ~ education$RegionName, vertical = TRUE)
stripchart(education$Y ~ education$RegionName, vertical = TRUE,
col = c("red", "blue", "green", "orange"))
Bar plot does not aggregrate by default. Therefore I use the
table
command to aggregate the counts:
barplot(education$Region)
barplot(education$RegionName)
table(education$RegionName) # to see what's being plotted
barplot(table(education$RegionName))
barplot(table(education$RegionName),
col = c("red", "blue", "green", "orange"))
To create a legend with this data requires some hacking. To compare,
in the second call, I use the unique
function to get unique
values:
barplot(table(education$RegionName),
col = c("red", "blue", "green", "orange"),
legend = education$RegionName)
barplot(table(education$RegionName),
col = c("red", "blue", "green", "orange"),
legend = unique(education$RegionName))
From the bigcity
data set. u
is the
population of 49 U.S. cities in 1920. x
is the the
population of these cities in 1930. I’m simply sorting these to provide
the example.
library(boot)
head(bigcity)
plot(bigcity$u, type = "b", col = "blue", pch = 18)
plot(sort(bigcity$u), type = "b", col = "blue", pch = 18)
lines(sort(bigcity$x), type = "b", col = "red", pch = 19)
legend("topleft", legend = c("1920", "1930"),
col = c("blue", "red"), lty = 1:1)
We need to aggregate the data for a pie chart. For this, I’ll use the
tapply
function to take the mean of the Y
variable for each RegionName
.
pie(tapply(education$Y, education$RegionName, FUN = mean))
pie(tapply(education$Y, education$RegionName, FUN = mean),
col = c("blue", "red", "green", "orange"))
hist(education$X1)
hist(education$X1, col = "red")
hist(education$X1, col = "#1565c0")
hist(education$X1, col = "#1565c0", breaks = 3)
plot(density(education$X1), col = "#1565c0")
dotchart(education$X1, labels = education$State,
groups = education$RegionName,
main = "Education Expenditure 1970s")
library(gplots)
library(psych)
plotmeans(education$X1 ~ education$RegionName)
plotmeans(education$X1 ~ education$RegionName, mean.labels = TRUE)
plotmeans(education$X1 ~ education$RegionName, connect = FALSE)
By default, plots are displayed and not saved as files. R can save plots in multiple file formats, and they all generally follow the syntax below that Here I save the basic plot as a PNG file:
png("plot1.png", width = 700, height = 700)
plotmeans(education$X1 ~ education$RegionName, connect = FALSE)
dev.off()
ggplot2 is a powerful and popular graphing library for R.
It’s also possible to create spatial/map plots. I created the following two:
See STHDA for many more fine examples plus some other plotting libraries.