In ggplot2 the scales describe how to map your data to point colors, shapes, …
Majority of commands to modify scales start with scale_ followed by the name of the aesthetics which you want to describe: scale_x_, scale_color_, …

➡️Go to RStudio Cheatsheets/Data Visualization Cheatsheet/Panel Scales to find short summary of commands for scales.

Location scales

These examples assume that a continuous variable is used for the location scales: x (horizontal) or y (vertical).

Try the following code to change the label of the vertical axis:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]" )

Now, add an extra argument limits to only show points with height in a certain range:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( 
    name = "Height [cm]", 
    limits = c( 130, 200 ) 
  )
Warning: Removed 2 rows containing missing values (geom_point).

You might see a warning: the current version of the ggplot2 library seems to have a bug.
Indeed, some points are filtered out by the limit argument but this is intentional and should not be reported as a warning.
You may disable chunk warnings by adding an option to the chunk header:

```{r warning=FALSE}
ggplot( ... ) + ...
```

Try the extra argument breaks to put ticks at provided positions:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( 
    name = "Height [cm]", 
    limits = c( 130, 200 ), 
    breaks = c( 130, 135, 150, 175, 180 ) 
  )

Color scale (continuous)

These examples assume that a continuous variable is used for color.

In general we advise to use the Viridis color scales (citing: “The viridis scales provide colour maps that are perceptually uniform in both colour and black-and-white. They are also designed to be perceived by viewers with common forms of colour blindness.”).

Try first the following code.
Then add option = "A" argument to scale_color_viridis_c().
Values from "A" to "E" provide different color mappings.

ggplot( pulse ) +
  aes( x = weight, y = height, color = age, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_viridis_c()

It is also possible to have a continuous transition between two manually given colors.
Try low and high arguments of scale_color_continuous:

ggplot( pulse ) +
  aes( x = weight, y = height, color = age, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_continuous( low = "blue", high = "red" )

Or try scale_color_gradientn to have continuous mapping through multiple colors:

ggplot( pulse ) +
  aes( x = weight, y = height, color = age, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_gradientn( colors = c( "red", "blue", "darkgreen", "yellow" ) )

Color scale (categorical)

These examples assume that a categorical variable is used for color.

Also for categorical (discrete) variables the Viridis color scheme is advisable.
Try the code below.
Change the argument option of scale_color_viridis_d for different color mappings.

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_viridis_d( option = "B" )

Another possibility is to use scale_color_manual, which declares that the color variable is categorical and that you want to manually declare a color for each category with the values argument:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_manual( values = c( "high" = "red", "moderate" = "gray60", "low" = "blue" ) )

Below observe how to use \n symbol to enforce multiple lines of text in the title of the color legend:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_manual( 
    values = c( "high" = "red", "moderate" = "gray60", "low" = "blue" ), 
    name = "Amount of\nexercise" 
  )

Finally, try the breaks argument to enforce the order of values in the legend:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_manual( 
    values = c( "high" = "red", "moderate" = "gray60", "low" = "blue" ), 
    name = "Amount of\nexercise",
    breaks = c( "low", "moderate", "high" )
  )

Shape scale (categorical)

shape must represent a categorical variable.

Add scale_shape_manual to the previous example to manually declare which shapes to use for female and male:

ggplot( pulse ) +
  aes( x = weight, y = height, color = exercise, shape = gender ) +
  geom_point( size = 3, alpha = 0.8 ) +
  scale_y_continuous( name = "Height [cm]", limits = c( 130, 200 ) ) +
  scale_color_manual( 
    values = c( "high" = "red", "moderate" = "gray60", "low" = "blue" ), 
    name = "Amount of\nexercise",
    breaks = c( "low", "moderate", "high" )
  ) +
  scale_shape_manual( values = c( "male" = 15, "female" = 19 ) )



Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC