Pipe operator

↓ Practice ← Add/modify variables Summarise →

Use \({\small \;\;{\%}{>}{\%}\;\;} \) to chain calculations.

The \({\small \;\;{\%}{>}{\%}\;\;} \) operator allows you to run your calculations in sequence instead of nested calls:

\[{\Large f(g(x)) \;\;\;\; \Leftrightarrow \;\;\;\; x {\small \;\;{\%}{>}{\%}\;\;} g {\small \;\;{\%}{>}{\%}\;\;} f}\] Here \(f\) and \(g\) are functions and \(x\) is a data object, possibly a vector but in majority of cases a tibble. The left hand side of the equivalence \(f(g(x))\) is the standard in R, it calls first the \(g(x)\) and then the result of \(g(x)\) is taken as input to the function \(f\). The right hand side, \(x {\small \;\;{\%}{>}{\%}\;\;} g {\small \;\;{\%}{>}{\%}\;\;} f\), will lead to the same result as \(f(g(x))\).

For example, recall exercise 4 in filter practice section on survey data:

How many males do smoke and never exercise?

The solution:

nrow( filter(survey, smokes!="never" & exercise=="none" & gender=="male") )

[1] 4

Here, first function filter is run on the tibble survey with some conditions resulting into a tibble with rows fulfilling those conditions, second the function nrow is called on the resulting tibble, hence our answer 4.

Use ctrl-M to get \({\small \;\;{\%}{>}{\%}\;\;} \) symbol.

Here is the same result using the \({\small \;\;{\%}{>}{\%}\;\;} \) operator:

survey %>% filter(smokes!="never" & exercise=="none" & gender=="male") %>% nrow()

[1] 4

The solution now reads from left to right \(survey {\small \;\;{\%}{>}{\%}\;\;} filter(...) {\small \;\;{\%}{>}{\%}\;\;} nrow\) instead of \(nrow(filter(survey,...))\).

Observation In the \({\small \;\;{\%}{>}{\%}\;\;} \) example above, note that filter does not have the survey as its first argument, the same is true for nrow. This is basically what the \({\small \;\;{\%}{>}{\%}\;\;} \) operator does, take the result of the left hand side and place it as the first argument of the function at its right hand side.

The main advantage of \({\small \;\;{\%}{>}{\%}\;\;} \) syntax is that it enables you to concentrate on the steps in the order they happen rather than the mathematical notation, e.g. \(f(g(h(i(...))))\)

Examples:

From the pulse dataset produce a tibble of personal information (name, age and gender):

of individuals taller than 190 cm.

pulse %>% filter(height>190) %>% 
  select(name,age,gender)

# A tibble: 5 × 3
  name          age gender
  <chr>       <dbl> <chr> 
1 Travis         18 male  
2 John           19 male  
3 Albert         25 male  
4 Lance          21 male  
5 Christopher    18 male

of individuals with weights between 40 and 50 kg.

pulse %>% filter(weight>40 & weight<50) %>% 
  select(name,age,gender)

# A tibble: 8 × 3
  name       age gender
  <chr>    <dbl> <chr> 
1 Tisha       18 female
2 Marissa     18 female
3 Adeline     20 female
4 Bridgett    19 female
5 Katrina     22 female
6 Julianne    19 female
7 Sherri      23 female
8 Bettie      19 female

of individuals that neither smoke nor drink.

pulse %>% filter(smokes=="no" & alcohol=="no") %>% 
  select(name,age,gender)

# A tibble: 40 × 3
   name        age gender
   <chr>     <dbl> <chr> 
 1 Frederick    19 male  
 2 Leslie       19 male  
 3 Maura        19 female
 4 Jerome       19 male  
 5 Arlene       34 female
 6 Glenna       20 female
 7 John         19 male  
 8 Erma         18 female
 9 Olga         21 female
10 Laurie       19 female
# … with 30 more rows

Complex example: We are asked to list only the names of the females in the pulse dataset with average pulse \(>110\). We can break down the problem into:

take only female observations (filter)
add a new variable averagePulse (mutate)
take only those average pulses above 110 (filter)
extract names

i.e.:

pulse %>%  filter(gender=="female")  %>%               # tibble females 
           mutate(averagePulse=(pulse1+pulse2)/2) %>%  # tibble females with averagePulse
           filter(averagePulse>110) %>%                # tibble females with averagePulse>100
           pull(name)                                  # vector of names

[1] "Melanie"  "Consuelo" "Kelli"    "Eliza"    "Maude"    "Lizzie"

⚠️ When in R Markdown file, each line in a sequence of commands with pipe must always end with the \({\small \;\;{\%}{>}{\%}\;\;} \) symbol except the last line, otherwise it is an error. See above example. Note also that each line may contain multiple \({\small \;\;{\%}{>}{\%}\;\;} \) symbols, e.g. first line in the example above, as long as it ends with a \({\small \;\;{\%}{>}{\%}\;\;} \) before the command on the next line. This way R know that there will be more commands on the next line.

↓ Practice ← Add/modify variables Summarise →