Use
mutate
to add new variables or modify the existing ones.
For example, the pulse dataset has two pulse measurements, let’s say
we are interested in average pulse and we want this information
to be available as a separate variable, e.g. averagePulse
,
in the pulse tibble. Then we can do this with:
mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993
6 1993_F George 184 74 22 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993
# ℹ 100 more rows
# ℹ 1 more variable: averagePulse <dbl>
By default the new column is added at the last position in the tibble.
AnswerDoes the pulse tibble now contain the variable
averagePulse
?
No, if you want to keep the new variable averagePulse
you’ll need to use assignment ‘<-’ to replace the original pulse
tibble with the newly modified version:
pulse <-
mutate(pulse, averagePulse = (pulse1+pulse2)/2)
Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]
Note that BMI definition states that weight
and
height
must be in kilograms and metres
respectively. In the pulse dataset weight
is given in
kilograms but height
is in centimetres. We can now first
create a new variable height_metre
containing the height in
metres and then calculate BMI:
pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993
6 1993_F George 184 74 22 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993
# ℹ 100 more rows
# ℹ 1 more variable: height_metre <dbl>
pulse_bmi
tibble has now the height in metre units, now
we can calculate BMI:
pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2))
pulse_bmi
# A tibble: 110 × 15
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993
6 1993_F George 184 74 22 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993
# ℹ 100 more rows
# ℹ 2 more variables: height_metre <dbl>, BMI <dbl>
Alternatively, you may skip the creation of height_metre
and calculate BMI directly from the pulse tibble:
pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2))
pulse_bmi
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year BMI
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 19.0
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 18.1
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 22.2
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 22.1
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 21.4
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 21.9
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 21.7
8 1993_H Frances… 169 55 18 female no yes moderate sat 71 77 1993 19.3
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 20.8
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 21.3
# ℹ 100 more rows
In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:
mutate(pulse, age=age*365)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 6570 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 6935 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 6570 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 6570 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 6570 female no yes low sat 90 88 1993
6 1993_F George 184 74 8030 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 7300 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 6570 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 6935 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 8395 male no yes moderate ran 88 150 1993
# ℹ 100 more rows
here we keep the variable age
but change its unit from
year to days.
Another example would be to convert the height
and
weight
from metric to imperial units with (1 kg = 2.2 lbs)
and (1 inch = 2.54 cm) :
mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 68.1 125. 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 70.5 128. 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 65.7 136. 18 female no yes high ran 96 176 1993
4 1993_D Travis 76.8 185. 18 male no yes high sat 71 73 1993
5 1993_E Lauri 68.1 141. 18 female no yes low sat 90 88 1993
6 1993_F George 72.4 163. 22 male no yes low ran 78 141 1993
7 1993_G Cherry 63.8 125. 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 66.5 121 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 64.6 123. 19 female no yes high sat 68 68 1993
10 1993_J Troy 66.1 132 23 male no yes moderate ran 88 150 1993
# ℹ 100 more rows
In the previous examples we were updating or adding variables with
simple arithmetic using mutate and all values were considered under the
same calculation. However, there are situation where we would like to
treat values conditionally, this is possible with the helper function
if_else
.
Examples:
Add a new variable max_pulse
reporting the higher pulse
rate of the two measurements pulse1
and pulse2
for each observation:
mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year max_pulse
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993… Bonn… 173 57 18 female no yes moderate sat 86 88 1993 88
2 1993… Mela… 179 58 19 female no yes moderate ran 82 150 1993 150
3 1993… Cons… 167 62 18 female no yes high ran 96 176 1993 176
4 1993… Trav… 195 84 18 male no yes high sat 71 73 1993 73
5 1993… Lauri 173 64 18 female no yes low sat 90 88 1993 90
6 1993… Geor… 184 74 22 male no yes low ran 78 141 1993 141
7 1993… Cher… 162 57 20 female no yes moderate sat 68 72 1993 72
8 1993… Fran… 169 55 18 female no yes moderate sat 71 77 1993 77
9 1993… Sonja 164 56 19 female no yes high sat 68 68 1993 68
10 1993… Troy 168 60 23 male no yes moderate ran 88 150 1993 150
# ℹ 100 more rows
Add a logical variable adult
which is true if age>=18
and false otherwise:
mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year adult
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 TRUE
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 TRUE
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 TRUE
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 TRUE
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 TRUE
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 TRUE
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 TRUE
8 1993_H Frances… 169 55 18 female no yes moderate sat 71 77 1993 TRUE
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 TRUE
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 TRUE
# ℹ 100 more rows
Convert gender
values, female
to
f
and male
to m
:
mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 f no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 f no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 f no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 m no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 f no yes low sat 90 88 1993
6 1993_F George 184 74 22 m no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 f no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 f no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 f no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 m no yes moderate ran 88 150 1993
# ℹ 100 more rows
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC