Use
mutate
to add new variables or modify the existing ones.
For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse
, in the pulse tibble. Then we can do this with:
mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year averagePulse
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 87
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 116
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 136
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 72
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 89
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 110.
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 70
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 74
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 68
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 119
# … with 100 more rows
By default the new column is added at the last position in the tibble.
AnswerDoes the pulse tibble now contain the variable
averagePulse
?
No, if you want to keep the new variable averagePulse
you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:
pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)
Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]
Note that BMI definition states that weight
and height
must be in kilograms and metres respectively. In the pulse dataset weight
is given in kilograms but height
is in centimetres. We can now first create a new variable height_metre
containing the height in metres and then calculate BMI:
pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year height_metre
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 1.73
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 1.79
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 1.67
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 1.95
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 1.73
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 1.84
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 1.62
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 1.69
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 1.64
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 1.68
# … with 100 more rows
pulse_bmi
tibble has now the height in metre units, now we can calculate BMI:
pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2))
pulse_bmi
# A tibble: 110 × 15
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year height_metre BMI
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 1.73 19.0
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 1.79 18.1
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 1.67 22.2
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 1.95 22.1
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 1.73 21.4
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 1.84 21.9
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 1.62 21.7
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 1.69 19.3
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 1.64 20.8
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 1.68 21.3
# … with 100 more rows
Alternatively, you may skip the creation of height_metre
and calculate BMI directly from the pulse tibble:
pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2))
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year BMI
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 19.0
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 18.1
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 22.2
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 22.1
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 21.4
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 21.9
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 21.7
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 19.3
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 20.8
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 21.3
# … with 100 more rows
In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:
mutate(pulse, age=age*365)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 6570 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 6935 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 6570 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 6570 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 6570 female no yes low sat 90 88 1993
6 1993_F George 184 74 8030 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 7300 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 6570 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 6935 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 8395 male no yes moderate ran 88 150 1993
# … with 100 more rows
here we keep the variable age
but change its unit from year to days.
Another example would be to convert the height
and weight
from metric to imperial units with (1 kg = 2.2 lbs) and (1 inch = 2.54 cm) :
mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 68.1 125. 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 70.5 128. 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 65.7 136. 18 female no yes high ran 96 176 1993
4 1993_D Travis 76.8 185. 18 male no yes high sat 71 73 1993
5 1993_E Lauri 68.1 141. 18 female no yes low sat 90 88 1993
6 1993_F George 72.4 163. 22 male no yes low ran 78 141 1993
7 1993_G Cherry 63.8 125. 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 66.5 121 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 64.6 123. 19 female no yes high sat 68 68 1993
10 1993_J Troy 66.1 132 23 male no yes moderate ran 88 150 1993
# … with 100 more rows
In the previous examples we were updating or adding variables with simple arithmetic using mutate and all values were considered under the same calculation. However, there are situation where we would like to treat values conditionally, this is possible with the helper function if_else
.
Examples:
Add a new variable max_pulse
reporting the higher pulse rate of the two measurements pulse1
and pulse2
for each observation:
mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year max_pulse
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 88
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 150
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 176
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 73
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 90
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 141
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 72
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 77
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 68
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 150
# … with 100 more rows
Add a logical variable adult
which is true if age>=18 and false otherwise:
mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year adult
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 TRUE
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 TRUE
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 TRUE
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 TRUE
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 TRUE
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 TRUE
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 TRUE
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 TRUE
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 TRUE
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 TRUE
# … with 100 more rows
Convert gender
values, female
to f
and male
to m
:
mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 f no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 f no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 f no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 m no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 f no yes low sat 90 88 1993
6 1993_F George 184 74 22 m no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 f no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 f no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 f no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 m no yes moderate ran 88 150 1993
# … with 100 more rows
Copyright © 2022 Biomedical Data Sciences (BDS) | LUMC