r:plotting-in-r

In this tutorial, I primarily cover base-R plot functions.

I used different data, but the examples here are built on those provided by STHDA.

The `plot()`

function is the most basic plot and is often used to plot line or
point plots. The syntax is `plot(x,y)`

but also includes arguments for
modifying the plot. Examples:

**Basic Plot**

ds <- USArrests # load the built-in dataset head(ds) # look at the top few rows plot(ds$Murder, ds$UrbanPop, type = "p")

Data example 2:

library(robustbase) edex <- education head(education)

**Adding Labels**

Variables include (from `?education`

):

- 'State' State
- 'Region' Region (1=Northeastern, 2=North central, 3=Southern, 4=Western)
- 'X1' Number of residents per thousand residing in urban areas in 1970
- 'X2' Per capita personal income in 1973
- 'X3' Number of residents per thousand under 18 years of age in 1974
- 'Y' Per capita expenditure on public education in a state, projected for 1975

Since the variables do not have descriptive names, we can add labels to the X and Y axes:

plot(education$X1, education$Y, xlab = "Residents per 1000 in Urban Areas", ylab = "Per capita Expenditure on Public Education", main = "Relationship Between Expenditure on Public Ed and Urban Pop")

**Adding Regression Line**

plot(education$X1, education$Y, xlab = "Residents per 1000 in Urban Areas", ylab = "Per capita Expenditure on Public Education", main = "Relationship Between Expenditure on Public Ed and Urban Pop") abline(lm(education$Y ~ education$X1), col = "blue")

**Adding Loess Line**

plot(education$X1, education$Y, xlab = "Residents per 1000 in Urban Areas", ylab = "Per capita Expenditure on Public Education", main = "Relationship Between Expenditure on Public Ed and Urban Pop") lines(lowess(education$Y ~ education$X1), col = "red")

**Scatterplots with Groups**

library("car") scatterplot(education$Y ~ education$X1 | education$Region)

Making the scatterplot more readable by describing the regions variable:

education$RegionName <- factor(education$Region, labels = c("Northeastern", "North Central", "Southern", "Western")) scatterplot(education$Y ~ education$X1 | education$RegionName)

**Scatterplot Matrices**

Here's we'll compare the four numerical variables by specifying their column numbers. We can define the kind of points the plot creates with the `pch`

argument. See `?points`

for options.

pairs(education[,3:6], pch = 20) pairs(education[,3:6], pch = 18, col = "blue") pairs(education[,3:6], pch = 18, col = "red", cex = 1.8)

boxplot(education$Y ~ education$Region) boxplot(education$Y ~ education$RegionName) boxplot(education$Y ~ education$RegionName, col = c("red", "blue", "green", "orange"))

stripchart(education$Y ~ education$RegionName) stripchart(education$Y ~ education$RegionName, vertical = TRUE) stripchart(education$Y ~ education$RegionName, vertical = TRUE, col = c("red", "blue", "green", "orange"))

Bar plot does not aggregrate by default. Therefore I use the `table`

command to
aggregate the counts:

barplot(education$Region) barplot(education$RegionName) table(education$RegionName) # to see what's being plotted barplot(table(education$RegionName)) barplot(table(education$RegionName), col = c("red", "blue", "green", "orange"))

To create a legend with this data requires some hacking. To compare, in the
second call, I use the `unique`

function to get unique values:

barplot(table(education$RegionName), col = c("red", "blue", "green", "orange"), legend = education$RegionName) barplot(table(education$RegionName), col = c("red", "blue", "green", "orange"), legend = unique(education$RegionName))

From the `bigcity`

data set. `u`

is the population of 49 U.S. cities in 1920.
`x`

is the the population of these cities in 1930. I'm simply sorting these to
provide the example.

library(boot) head(bigcity) plot(bigcity$u, type = "b", col = "blue", pch = 18) plot(sort(bigcity$u), type = "b", col = "blue", pch = 18) lines(sort(bigcity$x), type = "b", col = "red", pch = 19) legend("topleft", legend = c("1920", "1930"), col = c("blue", "red"), lty = 1:1)

We need to aggregate the data for a pie chart. For this, I'll use the `tapply`

function to take the mean of the `Y`

variable for each `RegionName`

.

pie(tapply(education$Y, education$RegionName, FUN = mean)) pie(tapply(education$Y, education$RegionName, FUN = mean), col = c("blue", "red", "green", "orange"))

hist(education$X1) hist(education$X1, col = "red") hist(education$X1, col = "#1565c0") hist(education$X1, col = "#1565c0", breaks = 3)

plot(density(education$X1), col = "#1565c0")

dotchart(education$X1, labels = education$State, groups = education$RegionName, main = "Education Expenditure 1970s")

library(gplots) library(psych) plotmeans(education$X1 ~ education$RegionName) plotmeans(education$X1 ~ education$RegionName, mean.labels = TRUE) plotmeans(education$X1 ~ education$RegionName, connect = FALSE)

By default, plots are displayed and not saved as files. R can save plots in multiple file formats, and they all generally follow the syntax below that I use to save a basic plot as a PNG file:

png("plot1.png", width = 700, height = 700) plotmeans(education$X1 ~ education$RegionName, connect = FALSE) dev.off()

ggplot2 is a powerful and popular graphing library for R.

It's also possible to create spatial/map plots. I created the following two:

- A map of Texas Death Row statistics

See STHDA for many more fine examples plus some other plotting libraries.

r/plotting-in-r.txt · Last modified: 2021/07/13 10:46 by seanburns