Generally speaking, ggplot2
geoms specify plot
types.
Each geom produces a plot layer and
multiple layers can be combined.
Here we demonstrate several frequently used geoms.
Try to regenerate the plots in your R Markdown document.
➡️Go to RStudio Cheatsheets/Data Visualization Cheatsheet/Panel Scales to see numerous geoms provided by the library.
➡️Go to The R Graph Gallery to see how R (often with ggplot2 library) can be used for data visualisation.
Let’s start with the histogram of the pulse2
variable
from the pulse
data:
ggplot( pulse ) +
aes( x = pulse2 ) +
geom_histogram( color = "black", fill = "gray" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
Note, that there were two groups of subjects: one did
run
, the other did not.
Try to add color to split histogram bars to make groups visible:
ggplot( pulse ) +
aes( x = pulse2, fill = ran ) +
geom_histogram( color = "black" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
The above histogram has the groups stacked.
To visualize each group separately, try to add position
argument as below (remember to add alpha
; otherwise some
bars might get hidden):
ggplot( pulse ) +
aes( x = pulse2, fill = ran ) +
geom_histogram( color = "black", position = "identity", alpha = 0.6 )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
An another possible value of the position
argument:
ggplot( pulse ) +
aes( x = pulse2, fill = ran ) +
geom_histogram( color = "black", position = "dodge" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
A boxplot might also be used to show the separation of the groups. Try:
ggplot( pulse ) +
aes( x = ran, y = pulse2 ) +
geom_boxplot()
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
The values used to calculate the boxplot can be put as an extra
layer of points on the top of the boxes.
This can be done with geom_point
(note, that here the
x
axis refers to categorical data).
To avoid double plotting of outliers, we disable them
in geom_boxplot
by setting their outlier.color
to NA
.
Try the following:
ggplot( pulse ) +
aes( x = ran, y = pulse2 ) +
geom_boxplot( outlier.color = NA ) +
geom_point( color = "red", alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).
The above plot suffers from overlap of the
points.
geom_jitter
(used instead of geom_point
)
allows to add noise to point locations.
The arguments width
and height
specify the
range of noise combined with x
and y
directions.
Try:
ggplot( pulse ) +
aes( x = ran, y = pulse2 ) +
geom_boxplot( outlier.color = NA ) +
geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).
In some contexts violin plots may be more useful
than the boxplots.
Replace geom_boxplot
with geom_violin
:
ggplot( pulse ) +
aes( x = ran, y = pulse2 ) +
geom_violin() +
geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).
This can be the final combination of a boxplot, violin plot, and points plot with extra horizontal noise:
ggplot( pulse ) +
aes( x = ran, y = pulse2 ) +
geom_violin() +
geom_boxplot( outlier.color = NA, fill = NA, color = "darkblue" ) +
geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).
In some contexts it might be important to visualise words in a
plot.
Try geom_text
as follows:
ggplot( pulse ) +
aes( x = pulse1, y = pulse2, label = name, color = gender ) +
geom_text( angle = -45, size = 3 )
Warning: Removed 1 rows containing missing values (`geom_text()`).
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC