(ADV) Fitting multiple simple linear regressions to parts of a
table.
Load the pulse.csv
dataset into the pulse
variable.
Let’s define the following goal: separately for each
gender
calculate a linear fit of
weight
as a function of height
.
First split
the pulse
table by
gender
into the pulseByGender
variable.
pulseByGender <- pulse %>% split( .$gender )
Now, pulseByGender
is a list and can be accessed with
[[...]]
operator.
Write the code to perform a linear fit of weight
as a
function of height
for females only
(obtained from pulseByGender
list).
lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[["female"]])
Coefficients:
(Intercept) height
-22.691 0.482
The next goal is to perform the above fit multiple times, separately
for each gender.
Define a variable genders
with all genders present in the
split object.
genders <- names( pulseByGender )
genders
[1] "female" "male"
Now a loop is needed.
A variable gender
should iterate over all
genders
.
For each value of gender
the respective linear fit should
be performed.
Understand and try the following code:
for( gender in genders ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
The above code calculates all needed fits but does show/store the
results anywhere.
The results need to be stored, for example in a list.
Understand/modify the code as follows:
fitByGender <- list()
for( gender in genders ) {
fitByGender[[gender]] <- lm( weight ~ height, data = pulseByGender[[ gender ]] )
}
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ "female" ]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
A special function lapply
allows to write the above code
differently.
In R this is a preferred notation (despite being less intuitive at the
first glance).
Rewrite the for
loop into lapply
as
follows:
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
[[1]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
[[2]]
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
names( fitByGender )
NULL
As you can see, fitByGender
is a list but the elements
are not named.
This is because lapply
names the elements as they were
named in the iterated input.
Check names
of genders
(you will see
NULL
names).
genders
[1] "female" "male"
names( genders )
NULL
Consequently, set the names of genders
elements to be
equal to the element values.
names( genders ) <- genders
genders
female male
"female" "male"
Now, let’s repeat the lapply
loop and check the
result.
Check (by eye) that fitByGender[[ 'female' ]]
shows the
same result as at the top of this example.
fitByGender <- lapply( genders, function( gender ) {
lm( weight ~ height, data = pulseByGender[[ gender ]] )
} )
fitByGender
$female
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482
$male
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
17.3348 0.3231
fitByGender[[ 'female' ]] # compare to lm( weight ~ height, data = pulseByGender[[ "female" ]] )
Call:
lm(formula = weight ~ height, data = pulseByGender[[gender]])
Coefficients:
(Intercept) height
-22.691 0.482