Use \({\small \;\;{\%}{>}{\%}\;\;} \) to chain calculations.

The \({\small \;\;{\%}{>}{\%}\;\;} \) operator allows you to run your function calls in sequence from left to right instead of nested function calls or using intermediate variables.

For example recall exercise 4 in filter practice section on survey data:

Solution:

nrow( filter(survey, smokes!="never" & exercise=="none" & gender=="male") )
[1] 4

Here, first function filter is run on the tibble survey with some conditions resulting into a tibble with rows fulfilling those conditions, second the function nrow is called on the resulting tibble, hence our answer 4.

Use ctrl-M to get \({\small \;\;{\%}{>}{\%}\;\;} \) symbol.

Solution with %>% :

survey %>% filter(smokes!="never" & exercise=="none" & gender=="male") %>% nrow()
[1] 4

The solution now reads from left to right \(survey {\small \;\;{\%}{>}{\%}\;\;} filter(...) {\small \;\;{\%}{>}{\%}\;\;} nrow\) instead of \(nrow(filter(survey,...))\).

Observation In the \({\small \;\;{\%}{>}{\%}\;\;} \) example above, note that filter does not have the survey as its first argument, the same is true for nrow. This is basically what the \({\small \;\;{\%}{>}{\%}\;\;} \) operator does, take the result of the left hand side and place it as the first argument of the function at its right hand side.

Examples:

From the pulse dataset produce a tibble of personal information (name, age and gender):

  1. of individuals taller than 190 cm.
pulse %>% filter(height>190) %>% 
  select(name,age,gender)
# A tibble: 5 × 3
  name          age gender
  <chr>       <dbl> <chr> 
1 Travis         18 male  
2 John           19 male  
3 Albert         25 male  
4 Lance          21 male  
5 Christopher    18 male  
  1. of individuals with weights between 40 and 50 kg.
pulse %>% filter(weight>40 & weight<50) %>% 
  select(name,age,gender)
# A tibble: 8 × 3
  name       age gender
  <chr>    <dbl> <chr> 
1 Tisha       18 female
2 Marissa     18 female
3 Adeline     20 female
4 Bridgett    19 female
5 Katrina     22 female
6 Julianne    19 female
7 Sherri      23 female
8 Bettie      19 female
  1. of individuals that neither smoke nor drink.
pulse %>% filter(smokes=="no" & alcohol=="no") %>% 
  select(name,age,gender)
# A tibble: 40 × 3
   name        age gender
   <chr>     <dbl> <chr> 
 1 Frederick    19 male  
 2 Leslie       19 male  
 3 Maura        19 female
 4 Jerome       19 male  
 5 Arlene       34 female
 6 Glenna       20 female
 7 John         19 male  
 8 Erma         18 female
 9 Olga         21 female
10 Laurie       19 female
# … with 30 more rows

Complex example: We are asked to list only the names of the females in the pulse dataset with average pulse \(>110\). We can break down the problem into:

  1. take only female observations (filter)
  2. add a new variable averagePulse (mutate)
  3. take only those average pulses above 110 (filter)
  4. extract names

i.e.:

pulse %>%  filter(gender=="female")  %>%               # tibble females 
           mutate(averagePulse=(pulse1+pulse2)/2) %>%  # tibble females with averagePulse
           filter(averagePulse>110) %>%                # tibble females with averagePulse>100
           pull(name)                                  # vector of names
[1] "Melanie"  "Consuelo" "Kelli"    "Eliza"    "Maude"    "Lizzie"  

⚠️ When in R Markdown file, each line in a sequence of commands with pipe must always end with the \({\small \;\;{\%}{>}{\%}\;\;} \) symbol except the last line, otherwise it is an error. See above example. Note also that each line may contain multiple \({\small \;\;{\%}{>}{\%}\;\;} \) symbols, e.g. first line in the example above, as long as it ends with a \({\small \;\;{\%}{>}{\%}\;\;} \) before the command on the next line. This way R know that there will be more commands on the next line.



Copyright © 2022 Biomedical Data Sciences (BDS) | LUMC