Use \({\small \;\;{\%}{>}{\%}\;\;} \) to chain calculations.
The \({\small \;\;{\%}{>}{\%}\;\;} \) operator allows you to run your calculations in sequence instead of nested calls:
\[{\Large f(g(x)) \;\;\;\; \Leftrightarrow \;\;\;\; x {\small \;\;{\%}{>}{\%}\;\;} g {\small \;\;{\%}{>}{\%}\;\;} f}\] Here \(f\) and \(g\) are functions and \(x\) is a data object, possibly a vector but in majority of cases a tibble. The left hand side of the equivalence \(f(g(x))\) is the standard in R, it calls first the \(g(x)\) and then the result of \(g(x)\) is taken as input to the function \(f\). The right hand side, \(x {\small \;\;{\%}{>}{\%}\;\;} g {\small \;\;{\%}{>}{\%}\;\;} f\), will lead to the same result as \(f(g(x))\).
For example, recall exercise 4 in filter
practice section on survey
data:
How many males do smoke and never exercise?
The solution:
nrow( filter(survey, smokes!="never" & exercise=="none" & gender=="male") )
[1] 4
Here, first function filter
is run on the tibble survey
with some conditions resulting into a tibble with rows fulfilling those conditions, second the function nrow
is called on the resulting tibble, hence our answer 4.
Use ctrl-M to get \({\small \;\;{\%}{>}{\%}\;\;} \) symbol.
Here is the same result using the \({\small \;\;{\%}{>}{\%}\;\;} \) operator:
survey %>% filter(smokes!="never" & exercise=="none" & gender=="male") %>% nrow()
[1] 4
The solution now reads from left to right \(survey {\small \;\;{\%}{>}{\%}\;\;} filter(...) {\small \;\;{\%}{>}{\%}\;\;} nrow\) instead of \(nrow(filter(survey,...))\).
Observation In the \({\small \;\;{\%}{>}{\%}\;\;} \) example above, note that filter
does not have the survey
as its first argument, the same is true for nrow
. This is basically what the \({\small \;\;{\%}{>}{\%}\;\;} \) operator does, take the result of the left hand side and place it as the first argument of the function at its right hand side.
The main advantage of \({\small \;\;{\%}{>}{\%}\;\;} \) syntax is that it enables you to concentrate on the steps in the order they happen rather than the mathematical notation, e.g. \(f(g(h(i(...))))\)
Examples:
From the pulse dataset produce a tibble of personal information (name, age and gender):
pulse %>% filter(height>190) %>%
select(name,age,gender)
# A tibble: 5 × 3
name age gender
<chr> <dbl> <chr>
1 Travis 18 male
2 John 19 male
3 Albert 25 male
4 Lance 21 male
5 Christopher 18 male
pulse %>% filter(weight>40 & weight<50) %>%
select(name,age,gender)
# A tibble: 8 × 3
name age gender
<chr> <dbl> <chr>
1 Tisha 18 female
2 Marissa 18 female
3 Adeline 20 female
4 Bridgett 19 female
5 Katrina 22 female
6 Julianne 19 female
7 Sherri 23 female
8 Bettie 19 female
pulse %>% filter(smokes=="no" & alcohol=="no") %>%
select(name,age,gender)
# A tibble: 40 × 3
name age gender
<chr> <dbl> <chr>
1 Frederick 19 male
2 Leslie 19 male
3 Maura 19 female
4 Jerome 19 male
5 Arlene 34 female
6 Glenna 20 female
7 John 19 male
8 Erma 18 female
9 Olga 21 female
10 Laurie 19 female
# … with 30 more rows
Complex example: We are asked to list only the names of the females in the pulse dataset with average pulse \(>110\). We can break down the problem into:
averagePulse
(mutate)i.e.:
pulse %>% filter(gender=="female") %>% # tibble females
mutate(averagePulse=(pulse1+pulse2)/2) %>% # tibble females with averagePulse
filter(averagePulse>110) %>% # tibble females with averagePulse>100
pull(name) # vector of names
[1] "Melanie" "Consuelo" "Kelli" "Eliza" "Maude" "Lizzie"
⚠️ When in R Markdown file, each line in a sequence of commands with pipe must always end with the \({\small \;\;{\%}{>}{\%}\;\;} \) symbol except the last line, otherwise it is an error. See above example. Note also that each line may contain multiple \({\small \;\;{\%}{>}{\%}\;\;} \) symbols, e.g. first line in the example above, as long as it ends with a \({\small \;\;{\%}{>}{\%}\;\;} \) before the command on the next line. This way R know that there will be more commands on the next line.
Copyright © 2022 Biomedical Data Sciences (BDS) | LUMC