Often we want to repeat certain series of steps in the analysis but each time with a subtle changes to the variables involved. This leads to repetition of the code in the script and consequently difficult to maintain or share with others. To that end, R provides the function construct to develop your own functions with your own defined arguments. The following is a scenario to motivate the usage of user-defined functions.

Motivation

Here is a scenario based on the pulse dataset and we will develop our analysis step by step. The initial problem definition is to calculate the mean height for individuals ranging over 50 to 60 kg.

pulse %>% filter(weight >= 50 & weight < 60) %>% 
    summarise(meanHeight=mean(height))
# A tibble: 1 × 1
  meanHeight
       <dbl>
1       167.

Now for the same individuals, how many of them did the pulse increase?

pulse %>% filter(weight >= 50 & weight < 60) %>% 
    summarise(meanHeight=mean(height))
# A tibble: 1 × 1
  meanHeight
       <dbl>
1       167.
pulse %>% filter(weight >= 50 & weight < 60) %>% 
    summarise(increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 1
  increasedPulse
           <int>
1             23

Let’s remove repetitions:

pulse %>% filter(weight >= 50 & weight < 60) %>% 
    summarise(meanHeight=mean(height), 
              increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
  meanHeight increasedPulse
       <dbl>          <int>
1       167.             23

We may want to produce other weight ranges, e.g. 60-70 and 70-80:

pulse %>% filter(weight >= 50 & weight < 60) %>% 
  summarise(meanHeight=mean(height), 
            increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
  meanHeight increasedPulse
       <dbl>          <int>
1       167.             23

pulse %>% filter(weight >= 60 & weight < 70) %>% 
  summarise(meanHeight=mean(height), 
            increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
  meanHeight increasedPulse
       <dbl>          <int>
1       169.             NA

pulse %>% filter(weight >= 70 & weight < 80) %>% 
  summarise(meanHeight=mean(height),
            increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
  meanHeight increasedPulse
       <dbl>          <int>
1       176.             14

Now include the number of individuals in different ranges:

pulse %>% filter(weight >= 50 & weight < 60) %>% 
  summarise(meanHeight=mean(height),
            increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       167.             23          31

pulse %>% filter(weight >= 60 & weight < 70) %>% 
  summarise(meanHeight=mean(height),
            increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       169.             NA          30

pulse %>% filter(weight >= 70 & weight < 80) %>% 
  summarise(meanHeight=mean(height),
            increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       176.             14          19

weightRangeSummary function

Define the function weightRangeSummary which takes pulse dataset along with the weight interval and produces the summary:

weightRangeSummary(pulse,50,60)
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       167.             23          31
weightRangeSummary(pulse,60,70)
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       169.             NA          30
weightRangeSummary(pulse,70,80)
# A tibble: 1 × 3
  meanHeight increasedPulse individuals
       <dbl>          <int>       <int>
1       176.             14          19

User-defined functions in R

Functions are constructs that encapsulate series of statements so you do not repeat the same statements all over again when needed.

functionName <- function(...) {
  statement
  ...
  <value>  # the result of the function, alternatively: return(<value>)
} 

Example 1

We define the following function add_one. It takes the value of an argument x and adds number 1 to it. The sum is the result of the function:

add_one <- function(x) {
    x + 1
}

This function can now be called with different values of the argument x:

add_one(2)
[1] 3
add_one(-1)
[1] 0

Type the function name without any () to see the body (definition) of the function?

add_one
function(x) {
    x + 1
}
<environment: 0x55de9af54010>

What is the class of add_one?

class( add_one )
[1] "function"

Example 2

Now let’s define a function to calculate \(x^2 + 1\):

square_add_one <- function(x) {
    result <- x^2    # choose a variable for temporary result
    result + 1
}

square_add_one(2) # 2^2 + 1 = 5
[1] 5
square_add_one(-1) # (-1)^2 + 1 = 2
[1] 2

The following versions of this functions are all equivalent:

# (v1) One operation per line.
square_add_one <- function(x) {
    result <- x^2
    result <- result + 1
    result    # The last statement is returned as the value.
}
# (v2) No additional 'result' variable needed 
square_add_one <- function(x) {
    x^2 + 1
}

weightRangeSummary function definition

weightRangeSummary <- function(x, lowerBound,upperBound) {
  x %>% filter(weight >= lowerBound & weight < upperBound) %>% 
    summarise(meanHeight=mean(height),
              increasedPulse=sum(pulse1<pulse2), individuals=n())
}

Account for the missing value in increasedPulse :

weightRangeSummary <- function(x, lowerBound,upperBound) {
  x %>% filter(weight >= lowerBound & weight < upperBound) %>% 
    summarise(meanHeight=mean(height),
              increasedPulse=sum(pulse1<pulse2, na.rm = TRUE), individuals=n())
}

Some observations



Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC