(ADV) Fitting multiple simple linear regressions to parts of a table.
Load the pulse.csv
dataset into the pulse
variable.
Let’s define the following goal: separately for each gender
calculate a linear fit of weight
as a function of height
.
First split
the pulse
table by gender
into the pulseByGender
variable.
pulseByGender <- pulse %>% split( .$gender )
Now, pulseByGender
is a list and can be accessed with [[...]]
operator.
Write the code to perform a linear fit of weight
as a function of height
for females only (obtained from pulseByGender
list).
lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[["female"]])
Coefficients:
(Intercept) height
-22.691 0.482
The next goal is to perform the above fit multiple times, separately for each gender.
Define a variable genders
with all genders present in the split object.
genders <- names( pulseByGender )
genders
[1] "female" "male"
Now a loop is needed.
A variable gender
should iterate over all genders
.
For each value of gender
the respective linear fit should be performed.
Understand and try the following code:
for( gender in genders ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
The above code calculates all needed fits but does show/store the results anywhere.
The results need to be stored, for example in a list.
Understand/modify the code as follows:
fitByGender <- list()
for( gender in genders ) {
fitByGender[[gender]] <- lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ "female" ]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
A special function lapply
allows to write the above code differently.
In R this is a preferred notation (despite being less intuitive at the first glance).
Rewrite the for
loop into lapply
as follows:
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
[[1]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
[[2]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
names( fitByGender )
NULL
As you can see, fitByGender
is a list but the elements are not named.
This is because lapply
names the elements as they were named in the iterated input.
Check names
of genders
(you will see NULL
names).
genders
[1] "female" "male"
names( genders )
NULL
Consequently, set the names of genders
elements to be equal to the element values.
names( genders ) <- genders
genders
female male
"female" "male"
Now, let’s repeat the lapply
loop and check the result.
Check (by eye) that fitByGender[[ 'female' ]]
shows the same result as at the top of this example.
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ 'female' ]] # compare to lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482