Often we want to repeat certain series of steps in the analysis but
each time with a subtle changes to the variables involved. This leads to
repetition of the code in the script and consequently difficult to
maintain or share with others. To that end, R provides the
function
construct to develop your own functions with your
own defined arguments. The following is a scenario to motivate the usage
of user-defined functions.
Motivation
Here is a scenario based on the pulse dataset and we will develop our analysis step by step. The initial problem definition is to calculate the mean height for individuals ranging over 50 to 60 kg.
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(meanHeight=mean(height))
# A tibble: 1 × 1
meanHeight
<dbl>
1 167.
Now for the same individuals, how many of them did the pulse increase?
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(meanHeight=mean(height))
# A tibble: 1 × 1
meanHeight
<dbl>
1 167.
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 1
increasedPulse
<int>
1 23
Let’s remove repetitions:
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
meanHeight increasedPulse
<dbl> <int>
1 167. 23
We may want to produce other weight ranges, e.g. 60-70 and 70-80:
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
meanHeight increasedPulse
<dbl> <int>
1 167. 23
pulse %>% filter(weight >= 60 & weight < 70) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
meanHeight increasedPulse
<dbl> <int>
1 169. NA
pulse %>% filter(weight >= 70 & weight < 80) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2))
# A tibble: 1 × 2
meanHeight increasedPulse
<dbl> <int>
1 176. 14
Now include the number of individuals in different ranges:
pulse %>% filter(weight >= 50 & weight < 60) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 167. 23 31
pulse %>% filter(weight >= 60 & weight < 70) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 169. NA 30
pulse %>% filter(weight >= 70 & weight < 80) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2), individuals=n())
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 176. 14 19
weightRangeSummary function
Define the function weightRangeSummary
which takes pulse
dataset along with the weight interval and produces the summary:
weightRangeSummary(pulse,50,60)
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 167. 23 31
weightRangeSummary(pulse,60,70)
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 169. NA 30
weightRangeSummary(pulse,70,80)
# A tibble: 1 × 3
meanHeight increasedPulse individuals
<dbl> <int> <int>
1 176. 14 19
Functions are constructs that encapsulate series of statements so you do not repeat the same statements all over again when needed.
functionName <- function(...) {
statement
...
<value> # the result of the function, alternatively: return(<value>)
}
We define the following function
add_one
. It takes the value of an argument
x
and adds number 1
to it. The sum is the
result of the function:
add_one <- function(x) {
x + 1
}
This function can now be called with different
values of the argument x
:
add_one(2)
[1] 3
add_one(-1)
[1] 0
Type the function name without any ()
to see the
body (definition) of the function?
add_one
function(x) {
x + 1
}
<environment: 0x55de9af54010>
What is the class of add_one
?
class( add_one )
[1] "function"
Now let’s define a function to calculate \(x^2 + 1\):
square_add_one <- function(x) {
result <- x^2 # choose a variable for temporary result
result + 1
}
square_add_one(2) # 2^2 + 1 = 5
[1] 5
square_add_one(-1) # (-1)^2 + 1 = 2
[1] 2
The following versions of this functions are all equivalent:
# (v1) One operation per line.
square_add_one <- function(x) {
result <- x^2
result <- result + 1
result # The last statement is returned as the value.
}
# (v2) No additional 'result' variable needed
square_add_one <- function(x) {
x^2 + 1
}
weightRangeSummary function definition
weightRangeSummary <- function(x, lowerBound,upperBound) {
x %>% filter(weight >= lowerBound & weight < upperBound) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2), individuals=n())
}
Account for the missing value in increasedPulse
:
weightRangeSummary <- function(x, lowerBound,upperBound) {
x %>% filter(weight >= lowerBound & weight < upperBound) %>%
summarise(meanHeight=mean(height),
increasedPulse=sum(pulse1<pulse2, na.rm = TRUE), individuals=n())
}
Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC