Use mutate to add new variables or modify the existing ones.

Add new variables

For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse, in the pulse tibble. Then we can do this with:

mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year averagePulse
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>        <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993          87 
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993         116 
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993         136 
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993          72 
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993          89 
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993         110.
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993          70 
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993          74 
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993          68 
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993         119 
# ℹ 100 more rows

By default the new column is added at the last position in the tibble.

Does the pulse tibble now contain the variable averagePulse ?

No, if you want to keep the new variable averagePulse you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:

pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)


Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]

Note that BMI definition states that weight and height must be in kilograms and metres respectively. In the pulse dataset weight is given in kilograms but height is in centimetres. We can now first create a new variable height_metre containing the height in metres and then calculate BMI:

pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year height_metre
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>        <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993         1.73
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993         1.79
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993         1.67
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993         1.95
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993         1.73
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993         1.84
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993         1.62
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993         1.69
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993         1.64
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993         1.68
# ℹ 100 more rows

pulse_bmi tibble has now the height in metre units, now we can calculate BMI:

pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2)) 
pulse_bmi
# A tibble: 110 × 15
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year height_metre   BMI
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>        <dbl> <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993         1.73  19.0
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993         1.79  18.1
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993         1.67  22.2
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993         1.95  22.1
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993         1.73  21.4
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993         1.84  21.9
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993         1.62  21.7
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993         1.69  19.3
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993         1.64  20.8
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993         1.68  21.3
# ℹ 100 more rows

Alternatively, you may skip the creation of height_metre and calculate BMI directly from the pulse tibble:

pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2)) 
pulse_bmi
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year   BMI
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl> <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993  19.0
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993  18.1
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993  22.2
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993  22.1
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993  21.4
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993  21.9
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993  21.7
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993  19.3
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993  20.8
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993  21.3
# ℹ 100 more rows

Update variables

In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:

mutate(pulse, age=age*365)
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie       173     57  6570 female no     yes     moderate sat       86     88  1993
 2 1993_B Melanie      179     58  6935 female no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo     167     62  6570 female no     yes     high     ran       96    176  1993
 4 1993_D Travis       195     84  6570 male   no     yes     high     sat       71     73  1993
 5 1993_E Lauri        173     64  6570 female no     yes     low      sat       90     88  1993
 6 1993_F George       184     74  8030 male   no     yes     low      ran       78    141  1993
 7 1993_G Cherry       162     57  7300 female no     yes     moderate sat       68     72  1993
 8 1993_H Francesca    169     55  6570 female no     yes     moderate sat       71     77  1993
 9 1993_I Sonja        164     56  6935 female no     yes     high     sat       68     68  1993
10 1993_J Troy         168     60  8395 male   no     yes     moderate ran       88    150  1993
# ℹ 100 more rows

here we keep the variable age but change its unit from year to days.

Another example would be to convert the height and weight from metric to imperial units with (1 kg = 2.2 lbs) and (1 inch = 2.54 cm) :

mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie      68.1   125.    18 female no     yes     moderate sat       86     88  1993
 2 1993_B Melanie     70.5   128.    19 female no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo    65.7   136.    18 female no     yes     high     ran       96    176  1993
 4 1993_D Travis      76.8   185.    18 male   no     yes     high     sat       71     73  1993
 5 1993_E Lauri       68.1   141.    18 female no     yes     low      sat       90     88  1993
 6 1993_F George      72.4   163.    22 male   no     yes     low      ran       78    141  1993
 7 1993_G Cherry      63.8   125.    20 female no     yes     moderate sat       68     72  1993
 8 1993_H Francesca   66.5   121     18 female no     yes     moderate sat       71     77  1993
 9 1993_I Sonja       64.6   123.    19 female no     yes     high     sat       68     68  1993
10 1993_J Troy        66.1   132     23 male   no     yes     moderate ran       88    150  1993
# ℹ 100 more rows

if_else(condition, true, false, …)

In the previous examples we were updating or adding variables with simple arithmetic using mutate and all values were considered under the same calculation. However, there are situation where we would like to treat values conditionally, this is possible with the helper function if_else.

Examples:

Add a new variable max_pulse reporting the higher pulse rate of the two measurements pulse1 and pulse2 for each observation:

mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year max_pulse
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>     <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993        88
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993       150
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993       176
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993        73
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993        90
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993       141
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993        72
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993        77
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993        68
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993       150
# ℹ 100 more rows

Add a logical variable adult which is true if age>=18 and false otherwise:

mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year adult
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl> <lgl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993 TRUE 
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993 TRUE 
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993 TRUE 
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993 TRUE 
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993 TRUE 
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993 TRUE 
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993 TRUE 
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993 TRUE 
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993 TRUE 
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993 TRUE 
# ℹ 100 more rows

Convert gender values, female to f and male to m:

mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie       173     57    18 f      no     yes     moderate sat       86     88  1993
 2 1993_B Melanie      179     58    19 f      no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo     167     62    18 f      no     yes     high     ran       96    176  1993
 4 1993_D Travis       195     84    18 m      no     yes     high     sat       71     73  1993
 5 1993_E Lauri        173     64    18 f      no     yes     low      sat       90     88  1993
 6 1993_F George       184     74    22 m      no     yes     low      ran       78    141  1993
 7 1993_G Cherry       162     57    20 f      no     yes     moderate sat       68     72  1993
 8 1993_H Francesca    169     55    18 f      no     yes     moderate sat       71     77  1993
 9 1993_I Sonja        164     56    19 f      no     yes     high     sat       68     68  1993
10 1993_J Troy         168     60    23 m      no     yes     moderate ran       88    150  1993
# ℹ 100 more rows


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC