Primary exercises
- For this exercise we will first split the
survey
dataset into two separate tables in order to join them again! Call thesedf1
anddf2
, these will have disjoint set of variables exceptname
andage
, the variablesname
andage
combined are unique in all observations and will be used later for joining. Take for example all variables related to arm or hand in df1 and the rest in df2:
df1 : "name" "span1" "span2" "hand" "fold" "clap" "age"
df2 : "name" "gender" "pulse" "exercise" "smokes" "height" "m.i" "age"
Join df1 and df2 by
name
andage
such that you obtain the original survey table.In exercise (a) does it make any difference to choose either of
inner_join
,left_join
orfull_join
? Hint: compare two tables with functionall_equal
.Are the pairs
name
andage
also good candidates as the key, i.e. is the combination ofname
andage
uniquely identify each observation in the survey data? What about the combination ofname
withspan1
orspan2
?