Generally speaking, ggplot2 geoms specify plot types.
Each geom produces a plot layer and multiple layers can be combined.
Here we demonstrate several frequently used geoms.
Try to regenerate the plots in your R Markdown document.

➡️Go to RStudio Cheatsheets/Data Visualization Cheatsheet/Panel Scales to see numerous geoms provided by the library.

➡️Go to The R Graph Gallery to see how R (often with ggplot2 library) can be used for data visualisation.

Histograms

Let’s start with the histogram of the pulse2 variable from the pulse data:

ggplot( pulse ) +
  aes( x = pulse2 ) +
  geom_histogram( color = "black", fill = "gray" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

Note, that there were two groups of subjects: one did run, the other did not.
Try to add color to split histogram bars to make groups visible:

ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

The above histogram has the groups stacked.
To visualize each group separately, try to add position argument as below (remember to add alpha; otherwise some bars might get hidden):

ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black", position = "identity", alpha = 0.6 )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

An another possible value of the position argument:

ggplot( pulse ) +
  aes( x = pulse2, fill = ran ) +
  geom_histogram( color = "black", position = "dodge" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

Boxplots

A boxplot might also be used to show the separation of the groups. Try:

ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot()
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

The values used to calculate the boxplot can be put as an extra layer of points on the top of the boxes.
This can be done with geom_point (note, that here the x axis refers to categorical data).
To avoid double plotting of outliers, we disable them in geom_boxplot by setting their outlier.color to NA.
Try the following:

ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot( outlier.color = NA ) +
  geom_point( color = "red", alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).

The above plot suffers from overlap of the points.
geom_jitter (used instead of geom_point) allows to add noise to point locations.
The arguments width and height specify the range of noise combined with x and y directions.
Try:

ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_boxplot( outlier.color = NA ) +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).

In some contexts violin plots may be more useful than the boxplots.
Replace geom_boxplot with geom_violin:

ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_violin() +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).

This can be the final combination of a boxplot, violin plot, and points plot with extra horizontal noise:

ggplot( pulse ) +
  aes( x = ran, y = pulse2 ) +
  geom_violin() +
  geom_boxplot( outlier.color = NA, fill = NA, color = "darkblue" ) +
  geom_jitter( color = "red", height = 0, width = 0.1, alpha = 0.5 )
Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 1 rows containing missing values (`geom_point()`).

A plot of texts

In some contexts it might be important to visualise words in a plot.
Try geom_text as follows:

ggplot( pulse ) +
  aes( x = pulse1, y = pulse2, label = name, color = gender ) +
  geom_text( angle = -45, size = 3 )
Warning: Removed 1 rows containing missing values (`geom_text()`).



Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC