User Tools

Site Tools


Plotting in R

In this tutorial, I primarily cover base-R plot functions.

I used different data, but the examples here are built on those provided by STHDA.

Base R Plots


The plot() function is the most basic plot and is often used to plot line or point plots. The syntax is plot(x,y) but also includes arguments for modifying the plot. Examples:

Basic Plot

ds <- USArrests # load the built-in dataset
head(ds) # look at the top few rows
plot(ds$Murder, ds$UrbanPop, type = "p")

Data example 2:

edex <- education

Adding Labels

Variables include (from ?education):

  • 'State' State
  • 'Region' Region (1=Northeastern, 2=North central, 3=Southern, 4=Western)
  • 'X1' Number of residents per thousand residing in urban areas in 1970
  • 'X2' Per capita personal income in 1973
  • 'X3' Number of residents per thousand under 18 years of age in 1974
  • 'Y' Per capita expenditure on public education in a state, projected for 1975

Since the variables do not have descriptive names, we can add labels to the X and Y axes:

plot(education$X1, education$Y,
     xlab = "Residents per 1000 in Urban Areas",
     ylab = "Per capita Expenditure on Public Education",
     main = "Relationship Between Expenditure on Public Ed and Urban Pop")

Adding Regression Line

plot(education$X1, education$Y,
     xlab = "Residents per 1000 in Urban Areas",
     ylab = "Per capita Expenditure on Public Education",
     main = "Relationship Between Expenditure on Public Ed and Urban Pop")
abline(lm(education$Y ~ education$X1), col = "blue")

Adding Loess Line

plot(education$X1, education$Y,
     xlab = "Residents per 1000 in Urban Areas",
     ylab = "Per capita Expenditure on Public Education",
     main = "Relationship Between Expenditure on Public Ed and Urban Pop")
lines(lowess(education$Y ~ education$X1), col = "red")


Scatterplots with Groups

scatterplot(education$Y ~ education$X1 | education$Region)

Making the scatterplot more readable by describing the regions variable:

education$RegionName <- factor(education$Region,
    labels = c("Northeastern", "North Central", "Southern", "Western"))
scatterplot(education$Y ~ education$X1 | education$RegionName)

Scatterplot Matrices

Here's we'll compare the four numerical variables by specifying their column numbers. We can define the kind of points the plot creates with the pch argument. See ?points for options.

pairs(education[,3:6], pch = 20)
pairs(education[,3:6], pch = 18, col = "blue")
pairs(education[,3:6], pch = 18, col = "red", cex = 1.8)

Box Plots

boxplot(education$Y ~ education$Region)
boxplot(education$Y ~ education$RegionName)
boxplot(education$Y ~ education$RegionName,
        col = c("red", "blue", "green", "orange"))

Strip Charts

stripchart(education$Y ~ education$RegionName)
stripchart(education$Y ~ education$RegionName, vertical = TRUE)
stripchart(education$Y ~ education$RegionName, vertical = TRUE,
           col = c("red", "blue", "green", "orange"))

Bar Plots

Bar plot does not aggregrate by default. Therefore I use the table command to aggregate the counts:

table(education$RegionName) # to see what's being plotted
        col = c("red", "blue", "green", "orange"))

To create a legend with this data requires some hacking. To compare, in the second call, I use the unique function to get unique values:

        col = c("red", "blue", "green", "orange"),
        legend = education$RegionName)
        col = c("red", "blue", "green", "orange"),
        legend = unique(education$RegionName))

Multiple Lines

From the bigcity data set. u is the population of 49 U.S. cities in 1920. x is the the population of these cities in 1930. I'm simply sorting these to provide the example.

plot(bigcity$u, type = "b", col = "blue", pch = 18)
plot(sort(bigcity$u), type = "b", col = "blue", pch = 18)
lines(sort(bigcity$x), type = "b", col = "red", pch = 19)
legend("topleft", legend = c("1920", "1930"),
       col = c("blue", "red"), lty = 1:1)

Pie Chart

We need to aggregate the data for a pie chart. For this, I'll use the tapply function to take the mean of the Y variable for each RegionName.

pie(tapply(education$Y, education$RegionName, FUN = mean))
pie(tapply(education$Y, education$RegionName, FUN = mean),
    col = c("blue", "red", "green", "orange"))


hist(education$X1, col = "red")
hist(education$X1, col = "#1565c0")
hist(education$X1, col = "#1565c0", breaks = 3)

Density Plots

plot(density(education$X1), col = "#1565c0")

Dot Chart

dotchart(education$X1, labels = education$State,
         groups = education$RegionName,
         main = "Education Expenditure 1970s")

Plot Group Means

plotmeans(education$X1 ~ education$RegionName)
plotmeans(education$X1 ~ education$RegionName, mean.labels = TRUE)
plotmeans(education$X1 ~ education$RegionName, connect = FALSE)

Saving Plots

By default, plots are displayed and not saved as files. R can save plots in multiple file formats, and they all generally follow the syntax below that I use to save a basic plot as a PNG file:

png("plot1.png", width = 700, height = 700)
plotmeans(education$X1 ~ education$RegionName, connect = FALSE)

See Also

ggplot2 is a powerful and popular graphing library for R.

It's also possible to create spatial/map plots. I created the following two:


See STHDA for many more fine examples plus some other plotting libraries.

r/plotting-in-r.txt · Last modified: 2021/07/13 10:46 by seanburns