Use mutate to add new variables or modify the existing ones.

Add new variables

For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse, in the pulse tibble. Then we can do this with:

mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year avera…¹
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>   <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993     87 
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993    116 
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993    136 
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993     72 
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993     89 
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993    110.
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993     70 
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993     74 
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993     68 
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993    119 
# … with 100 more rows, and abbreviated variable name ¹​averagePulse

By default the new column is added at the last position in the tibble.

Does the pulse tibble now contain the variable averagePulse ?

No, if you want to keep the new variable averagePulse you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:

pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)


Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]

Note that BMI definition states that weight and height must be in kilograms and metres respectively. In the pulse dataset weight is given in kilograms but height is in centimetres. We can now first create a new variable height_metre containing the height in metres and then calculate BMI:

pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year heigh…¹
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>   <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993    1.73
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993    1.79
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993    1.67
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993    1.95
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993    1.73
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993    1.84
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993    1.62
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993    1.69
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993    1.64
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993    1.68
# … with 100 more rows, and abbreviated variable name ¹​height_metre

pulse_bmi tibble has now the height in metre units, now we can calculate BMI:

pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2)) 
pulse_bmi
# A tibble: 110 × 15
   id    name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2  year heigh…²   BMI
   <chr> <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl> <dbl>   <dbl> <dbl>
 1 1993… Bonn…    173     57    18 female no     yes     modera… sat       86     88  1993    1.73  19.0
 2 1993… Mela…    179     58    19 female no     yes     modera… ran       82    150  1993    1.79  18.1
 3 1993… Cons…    167     62    18 female no     yes     high    ran       96    176  1993    1.67  22.2
 4 1993… Trav…    195     84    18 male   no     yes     high    sat       71     73  1993    1.95  22.1
 5 1993… Lauri    173     64    18 female no     yes     low     sat       90     88  1993    1.73  21.4
 6 1993… Geor…    184     74    22 male   no     yes     low     ran       78    141  1993    1.84  21.9
 7 1993… Cher…    162     57    20 female no     yes     modera… sat       68     72  1993    1.62  21.7
 8 1993… Fran…    169     55    18 female no     yes     modera… sat       71     77  1993    1.69  19.3
 9 1993… Sonja    164     56    19 female no     yes     high    sat       68     68  1993    1.64  20.8
10 1993… Troy     168     60    23 male   no     yes     modera… ran       88    150  1993    1.68  21.3
# … with 100 more rows, and abbreviated variable names ¹​exercise, ²​height_metre

Alternatively, you may skip the creation of height_metre and calculate BMI directly from the pulse tibble:

pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2)) 
pulse_bmi
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year   BMI
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl> <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993  19.0
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993  18.1
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993  22.2
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993  22.1
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993  21.4
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993  21.9
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993  21.7
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993  19.3
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993  20.8
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993  21.3
# … with 100 more rows

Update variables

In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:

mutate(pulse, age=age*365)
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie       173     57  6570 female no     yes     moderate sat       86     88  1993
 2 1993_B Melanie      179     58  6935 female no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo     167     62  6570 female no     yes     high     ran       96    176  1993
 4 1993_D Travis       195     84  6570 male   no     yes     high     sat       71     73  1993
 5 1993_E Lauri        173     64  6570 female no     yes     low      sat       90     88  1993
 6 1993_F George       184     74  8030 male   no     yes     low      ran       78    141  1993
 7 1993_G Cherry       162     57  7300 female no     yes     moderate sat       68     72  1993
 8 1993_H Francesca    169     55  6570 female no     yes     moderate sat       71     77  1993
 9 1993_I Sonja        164     56  6935 female no     yes     high     sat       68     68  1993
10 1993_J Troy         168     60  8395 male   no     yes     moderate ran       88    150  1993
# … with 100 more rows

here we keep the variable age but change its unit from year to days.

Another example would be to convert the height and weight from metric to imperial units with (1 kg = 2.2 lbs) and (1 inch = 2.54 cm) :

mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie      68.1   125.    18 female no     yes     moderate sat       86     88  1993
 2 1993_B Melanie     70.5   128.    19 female no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo    65.7   136.    18 female no     yes     high     ran       96    176  1993
 4 1993_D Travis      76.8   185.    18 male   no     yes     high     sat       71     73  1993
 5 1993_E Lauri       68.1   141.    18 female no     yes     low      sat       90     88  1993
 6 1993_F George      72.4   163.    22 male   no     yes     low      ran       78    141  1993
 7 1993_G Cherry      63.8   125.    20 female no     yes     moderate sat       68     72  1993
 8 1993_H Francesca   66.5   121     18 female no     yes     moderate sat       71     77  1993
 9 1993_I Sonja       64.6   123.    19 female no     yes     high     sat       68     68  1993
10 1993_J Troy        66.1   132     23 male   no     yes     moderate ran       88    150  1993
# … with 100 more rows

if_else(condition, true, false, …)

In the previous examples we were updating or adding variables with simple arithmetic using mutate and all values were considered under the same calculation. However, there are situation where we would like to treat values conditionally, this is possible with the helper function if_else.

Examples:

Add a new variable max_pulse reporting the higher pulse rate of the two measurements pulse1 and pulse2 for each observation:

mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year max_p…¹
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>   <dbl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993      88
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993     150
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993     176
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993      73
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993      90
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993     141
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993      72
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993      77
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993      68
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993     150
# … with 100 more rows, and abbreviated variable name ¹​max_pulse

Add a logical variable adult which is true if age>=18 and false otherwise:

mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year adult
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl> <lgl>
 1 1993_A Bonnie       173     57    18 female no     yes     moderate sat       86     88  1993 TRUE 
 2 1993_B Melanie      179     58    19 female no     yes     moderate ran       82    150  1993 TRUE 
 3 1993_C Consuelo     167     62    18 female no     yes     high     ran       96    176  1993 TRUE 
 4 1993_D Travis       195     84    18 male   no     yes     high     sat       71     73  1993 TRUE 
 5 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993 TRUE 
 6 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993 TRUE 
 7 1993_G Cherry       162     57    20 female no     yes     moderate sat       68     72  1993 TRUE 
 8 1993_H Francesca    169     55    18 female no     yes     moderate sat       71     77  1993 TRUE 
 9 1993_I Sonja        164     56    19 female no     yes     high     sat       68     68  1993 TRUE 
10 1993_J Troy         168     60    23 male   no     yes     moderate ran       88    150  1993 TRUE 
# … with 100 more rows

Convert gender values, female to f and male to m:

mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_A Bonnie       173     57    18 f      no     yes     moderate sat       86     88  1993
 2 1993_B Melanie      179     58    19 f      no     yes     moderate ran       82    150  1993
 3 1993_C Consuelo     167     62    18 f      no     yes     high     ran       96    176  1993
 4 1993_D Travis       195     84    18 m      no     yes     high     sat       71     73  1993
 5 1993_E Lauri        173     64    18 f      no     yes     low      sat       90     88  1993
 6 1993_F George       184     74    22 m      no     yes     low      ran       78    141  1993
 7 1993_G Cherry       162     57    20 f      no     yes     moderate sat       68     72  1993
 8 1993_H Francesca    169     55    18 f      no     yes     moderate sat       71     77  1993
 9 1993_I Sonja        164     56    19 f      no     yes     high     sat       68     68  1993
10 1993_J Troy         168     60    23 m      no     yes     moderate ran       88    150  1993
# … with 100 more rows


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC