Generally speaking, ggplot2 geoms specify plot
types.
Each geom produces a plot layer and
multiple layers can be combined.
Here we demonstrate several frequently used geoms.
Try to regenerate the plots in your R Markdown document.
➡️Go to RStudio Cheatsheets/Data Visualization Cheatsheet/Panel Scales to see numerous geoms provided by the library.
➡️Go to The R Graph Gallery to see how R (often with ggplot2 library) can be used for data visualisation.
Let’s start with the histogram of the pulse2 variable
from the pulse data:
ggplot( pulse ) +
  aes( x = pulse2 ) +
  geom_histogram( color = "black", fill = "gray" )`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Warning: Removed 1 rows containing non-finite values (`stat_bin()`).Note, that there were two groups of subjects: one did
run, the other did not.
Try to add color to split histogram bars to make groups visible:
ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black" )`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Warning: Removed 1 rows containing non-finite values (`stat_bin()`).The above histogram has the groups stacked.
To visualize each group separately, try to add position
argument as below (remember to add alpha; otherwise some
bars might get hidden):
ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black", position = "identity", alpha = 0.6 )`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Warning: Removed 1 rows containing non-finite values (`stat_bin()`).An another possible value of the position argument:
ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black", position = "dodge" )`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Warning: Removed 1 rows containing non-finite values (`stat_bin()`).A boxplot might also be used to show the separation of the groups. Try:
ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot()Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).The values used to calculate the boxplot can be put as an extra
layer of points on the top of the boxes.
This can be done with geom_point (note, that here the
x axis refers to categorical data).
To avoid double plotting of outliers, we disable them
in geom_boxplot by setting their outlier.color
to NA.
Try the following:
ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot( outlier.color = NA ) +
  geom_point( color = "red", alpha = 0.5 )Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).Warning: Removed 1 rows containing missing values (`geom_point()`).The above plot suffers from overlap of the
points.
geom_jitter (used instead of geom_point)
allows to add noise to point locations.
The arguments width and height specify the
range of noise combined with x and y
directions.
Try:
ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot( outlier.color = NA ) +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).Warning: Removed 1 rows containing missing values (`geom_point()`).In some contexts violin plots may be more useful
than the boxplots.
Replace geom_boxplot with geom_violin:
ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_violin() +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).Warning: Removed 1 rows containing missing values (`geom_point()`).This can be the final combination of a boxplot, violin plot, and points plot with extra horizontal noise:
ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_violin() +
  geom_boxplot( outlier.color = NA, fill = NA, color = "darkblue" ) +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).Warning: Removed 1 rows containing missing values (`geom_point()`).In some contexts it might be important to visualise words in a
plot.
Try geom_text as follows:
ggplot( pulse ) +
  aes( x = pulse1, y = pulse2, label = name, color = gender ) +
  geom_text( angle = -45, size = 3 )Warning: Removed 1 rows containing missing values (`geom_text()`).Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC