(ADV) Fitting multiple simple linear regressions to parts of a table.
Load the pulse.csv dataset into the pulse variable.
Let’s define the following goal: separately for each gender calculate a linear fit of weight as a function of height.
First split the pulse table by gender into the pulseByGender variable.
pulseByGender <- pulse %>% split( .$gender )
Now, pulseByGender is a list and can be accessed with [[...]] operator.
Write the code to perform a linear fit of weight as a function of height for females only (obtained from pulseByGender list).
lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[["female"]])
Coefficients:
(Intercept) height
-22.691 0.482
The next goal is to perform the above fit multiple times, separately for each gender.
Define a variable genders with all genders present in the split object.
genders <- names( pulseByGender )
genders
[1] "female" "male"
Now a loop is needed.
A variable gender should iterate over all genders.
For each value of gender the respective linear fit should be performed.
Understand and try the following code:
for( gender in genders ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
The above code calculates all needed fits but does show/store the results anywhere.
The results need to be stored, for example in a list.
Understand/modify the code as follows:
fitByGender <- list()
for( gender in genders ) {
fitByGender[[gender]] <- lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ "female" ]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
A special function lapply allows to write the above code differently.
In R this is a preferred notation (despite being less intuitive at the first glance).
Rewrite the for loop into lapply as follows:
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
[[1]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
[[2]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
names( fitByGender )
NULL
As you can see, fitByGender is a list but the elements are not named.
This is because lapply names the elements as they were named in the iterated input.
Check names of genders (you will see NULL names).
genders
[1] "female" "male"
names( genders )
NULL
Consequently, set the names of genders elements to be equal to the element values.
names( genders ) <- genders
genders
female male
"female" "male"
Now, let’s repeat the lapply loop and check the result.
Check (by eye) that fitByGender[[ 'female' ]] shows the same result as at the top of this example.
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ 'female' ]] # compare to lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482