Primary exercises
Use \({\small \;\;{\%}{>}{\%}\;\;} \) as much as possible in your solutions.
In the survey dataset:
- Right-handedness
- How many individuals are right handed?
survey %>% filter(hand=="right") %>% nrow()
[1] 216
- Are there more right-handed males than females?
# right-handed females
rh_females <- survey %>%
filter(hand=="right" & gender=="female") %>%
nrow()
# right-handed males
rh_males <- survey %>%
filter(hand=="right" & gender=="male") %>%
nrow()
# Test whether there are strictly more right-handed males than females.
#
# FALSE => the answer is no
# TRUE => answer is yes
rh_males > rh_females
[1] FALSE
- Count the number of females who never smoke. Do the same for males.
female_smokers <- survey %>% filter(smokes=="never" & gender=="female") %>% nrow()
female_smokers
[1] 98
male_smokers <- survey %>% filter(smokes=="never" & gender=="male") %>% nrow()
male_smokers
[1] 88
- Produce the percentages of female and male non-smokers.
tot_females <- survey %>% filter(gender=="female") %>% nrow() # total females
tot_males <- survey %>% filter(gender=="male") %>% nrow() # total males
# Take female_smokers and male_smokers from previous exercise
(female_smokers/tot_females)*100 # % female smokers
[1] 83.76068
(male_smokers/tot_males)*100 # % male smokers
[1] 75.86207
- The variables
pulse
,height
andm.i
have missing values (NA
).
- Count for each variable number of missing (NA). Use the function
is.na
(ref) as a condition to find the missing values.
pulse_NAs <- survey %>% filter(is.na(pulse)) %>% nrow() # nr. of NA's pulse variable
pulse_NAs
[1] 45
height_NAs <- survey %>% filter(is.na(height)) %>% nrow() # nr. of NA's height variable
height_NAs
[1] 27
mi_NAs <- survey %>% filter(is.na(m.i)) %>% nrow() # nr. of NA's m.i variable
mi_NAs
[1] 27
- The variables height and m.i have the same amount of missing (NA), is it the case that these missing values are in the same observations (rows), i.e. that if height is missing then also m.i is missing at the same row?
# From the first part of this exercise we know that 27 observations are missing (NA)
# in 'height' and 'm.i'. We only need to count how many observations in survey are
# missing height and m.i at the same time. If that is equal to 27 then the answer is
# yes.
# number of observation for which we have height=m.i=NA
( survey %>% filter(is.na(height) & is.na(m.i)) %>% nrow() ) == height_NAs
[1] TRUE