Use \({\small \;\;{\%}{>}{\%}\;\;} \) as much as possible in your solutions.

Primary exercises

In the survey dataset:

  1. Right-handedness
survey %>%  filter(hand=="right") %>% nrow()
[1] 216
# right-handed females
rh_females <- survey %>%  
              filter(hand=="right" & gender=="female")  %>% 
              nrow() 
# right-handed males
rh_males <- survey %>%  
            filter(hand=="right" & gender=="male") %>% 
            nrow()

# Test whether there are strictly more right-handed males than females. 
#
# FALSE => the answer is no 
# TRUE => answer is yes
rh_males > rh_females  
[1] FALSE
  1. Count the number of females who never smoke. Do the same for males.
female_smokers <- survey %>% filter(smokes=="never" & gender=="female") %>% nrow()
female_smokers
[1] 98
male_smokers <- survey %>% filter(smokes=="never" & gender=="male") %>%  nrow()
male_smokers
[1] 88
  1. Produce the percentages of female and male non-smokers.
tot_females <- survey %>% filter(gender=="female") %>%  nrow() # total females
tot_males <- survey %>% filter(gender=="male") %>%  nrow()     # total males

# Take female_smokers and male_smokers from previous exercise
(female_smokers/tot_females)*100 # % female smokers
[1] 83.76068
(male_smokers/tot_males)*100     # % male smokers
[1] 75.86207
  1. The variables pulse, height and m.i have missing values (NA).
pulse_NAs <- survey %>% filter(is.na(pulse)) %>%  nrow()   # nr. of NA's pulse variable 
pulse_NAs
[1] 45
height_NAs <- survey %>% filter(is.na(height)) %>%  nrow()  # nr. of NA's height variable 
height_NAs
[1] 27
mi_NAs <- survey %>% filter(is.na(m.i)) %>%  nrow()     # nr. of NA's m.i variable 
mi_NAs
[1] 27
# From the first part of this exercise we know that 27 observations are missing (NA) 
# in 'height' and 'm.i'. We only need to count how many observations in survey are 
# missing height and m.i at the same time. If that is equal to 27 then the answer is 
# yes.

# number of observation for which we have height=m.i=NA
( survey %>% filter(is.na(height) & is.na(m.i))  %>% nrow() ) == height_NAs
[1] TRUE

Extra exercises

  1. What is the maximum female height in inch units from the survey dataset?
# The 'height' variable has missing values, i.e. 'NA'. Most functions in R, e.g. sum, 
# max, min, median, mean and so on,  when operating on vectors with missing values 
# produce a NA, and rightly so. However, the functions are equipped with an additional 
# argument 'na.rm' if you choose to run the function on the non-missing part of the 
# vector. Beware that max(weig) 

survey %>% filter(gender=="female") %>%  
          mutate(height_inch=height * 0.393701) %>% 
          pull(height_inch) %>%  max(na.rm = TRUE)
[1] 70.07878
  1. What is right-handedness percentage in females who frequently exercise? What about males?
# 1) First we create a tibble 'female_exercise_freq' for females with exercise=="freq"
female_exercise_freq <- survey %>% 
  filter(gender=="female" & exercise=="freq") 

# 2) From the group in 'female_exercise_freq' calculate the total ('female_freq_count') 
# and the nr. of right-handed ones ('female_freq_rh_count').
female_freq_count <- female_exercise_freq %>% 
  nrow()  # calculate total nr. of females who frequently exercise
female_freq_rh_count <- female_exercise_freq %>% 
  filter(hand=="right") %>% nrow() #  nr. of right-handed ones

# 3) calculate the fraction and the percentage.
(female_freq_rh_count / female_freq_count)*100  # % 
[1] 93.75


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC