Primary exercises

In the survey dataset:

  1. Select teenagers, assume age range between and including 10 and 19.
filter(survey, 10<=age & age<20)
# A tibble: 169 × 13
   name    gender span1 span2 hand  fold  pulse clap    exercise smokes height m.i        age
   <chr>   <chr>  <dbl> <dbl> <chr> <chr> <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
 1 Alyson  female  18.5  18   right right    92 left    some     never    173  metric    18.2
 2 Todd    male    19.5  20.5 left  right   104 left    none     regul    178. imperial  17.6
 3 Gerald  male    18    13.3 right left     87 neither none     occas     NA  <NA>      16.9
 4 Andre   male    17.7  17.7 right left     83 right   freq     never    183. imperial  18.8
 5 Edward  male    20    19.5 right right    72 right   some     never    175  metric    19  
 6 Alfred  male    21    21   right right    68 left    freq     never     NA  <NA>      18.2
 7 Bernice female  16    16   right left     NA right   some     never    155  metric    18.8
 8 Velma   female  19.5  20.2 right left     66 neither some     never    155  metric    17.5
 9 Eddie   male    16    15.5 right right    60 right   some     never     NA  <NA>      17.2
10 Fern    female  17.5  17   right right    NA right   freq     never    156  metric    17.2
# ℹ 159 more rows
  1. Select all females with pulse equal to 60
filter(survey, pulse==60 & gender=="female")
# A tibble: 4 × 13
  name    gender span1 span2 hand  fold  pulse clap    exercise smokes height m.i        age
  <chr>   <chr>  <dbl> <dbl> <chr> <chr> <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
1 Elnora  female  18    17.6 right right    60 right   some     occas    168  metric    18.4
2 Lavonne female  17.5  17.5 right right    60 right   freq     never    166. metric    23.2
3 Dianna  female  16    15.5 right left     60 left    freq     never    163. imperial  17.4
4 Patrica female  16.5  16.9 right right    60 neither freq     occas    169. metric    29.1
  1. Select all male teenagers with pulse above 60.
filter(survey, pulse>60 & gender=="male" & (10<=age & age<20) )
# A tibble: 59 × 13
   name    gender span1 span2 hand  fold  pulse clap    exercise smokes height m.i        age
   <chr>   <chr>  <dbl> <dbl> <chr> <chr> <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
 1 Todd    male    19.5  20.5 left  right   104 left    none     regul    178. imperial  17.6
 2 Gerald  male    18    13.3 right left     87 neither none     occas     NA  <NA>      16.9
 3 Andre   male    17.7  17.7 right left     83 right   freq     never    183. imperial  18.8
 4 Edward  male    20    19.5 right right    72 right   some     never    175  metric    19  
 5 Alfred  male    21    21   right right    68 left    freq     never     NA  <NA>      18.2
 6 Virgil  male    19.4  19.2 left  right    74 right   some     never    183. imperial  18.3
 7 Richard male    21    20.9 right right    78 right   freq     never    177  metric    17.9
 8 Virgil  male    21.5  22   right right    72 left    freq     never    190. imperial  17.9
 9 Troy    male    20.1  20.7 right left     72 right   freq     never    180. imperial  18.2
10 Charlie male    18.5  18   right left     64 right   freq     never    180. imperial  17.8
# ℹ 49 more rows
  1. How many males do smoke and never exercise?
# The conditions are 'do smoke' and 'never exercise'. With 'do smoke' we have the categories {regul,occas,heavy} therefore 'or' (|) logical connective is in place. For 'never exercise' we have the category 'none'. Since both conditions must be true we will have the 'and' (&) logical connective.

nrow( filter(survey, (smokes=="regul" | smokes=="occas" | smokes=="heavy") & exercise=="none" & gender=="male" ) )
[1] 4
# Alternatively: a shorter condition for 'smokes' is smokes!="never". It means accept all values for 'smokes' as long as it is not equal to "never" and those are exactly {regul,occas,heavy}.

nrow( filter(survey, smokes!="never" & exercise=="none" & gender=="male" ) )
[1] 4
  1. How many females never smoke and frequently exercise?
nrow( filter(survey, smokes=="never" & exercise=="freq" & gender=="female") )
[1] 38
  1. Produce the following tibbles:

    6.1 Personal information {Name,Age,Gender,Height} of all teenagers.

    filter(select(survey, Name=name, Age=age, Gender=gender, Height=height ), 10<=Age & Age<20)
    # A tibble: 169 × 4
       Name      Age Gender Height
       <chr>   <dbl> <chr>   <dbl>
     1 Alyson   18.2 female   173 
     2 Todd     17.6 male     178.
     3 Gerald   16.9 male      NA 
     4 Andre    18.8 male     183.
     5 Edward   19   male     175 
     6 Alfred   18.2 male      NA 
     7 Bernice  18.8 female   155 
     8 Velma    17.5 female   155 
     9 Eddie    17.2 male      NA 
    10 Fern     17.2 female   156 
    # ℹ 159 more rows

    6.2 Personal information of males with Height between and inclusive 170 to 180.

    filter(select(survey, Name=name, Age=age, Gender=gender, Height=height),
            170<=Height & Height<=180 & Gender=="male")
    # A tibble: 45 × 4
       Name      Age Gender Height
       <chr>   <dbl> <chr>   <dbl>
     1 Todd     17.6 male     178.
     2 Edward   19   male     175 
     3 Richard  17.9 male     177 
     4 Joe      17.5 male     173.
     5 Floyd    18.1 male     175.
     6 Russell  17.5 male     180 
     7 George   17.2 male     180 
     8 Mathew   19.9 male     171 
     9 Willard  18.9 male     180 
    10 Andrew   19.4 male     170 
    # ℹ 35 more rows
  2. Has survey data missing pulse values? If so how many? What about age?

nrow(filter(survey, is.na(pulse))) # number of missing 'pulse' values.
[1] 45
filter(survey, is.na(age)) # empty result set, no missing values in 'age'.
# A tibble: 0 × 13
# ℹ 13 variables: name <chr>, gender <chr>, span1 <dbl>, span2 <dbl>, hand <chr>, fold <chr>, pulse <dbl>, clap <chr>, exercise <chr>, smokes <chr>, height <dbl>,
#   m.i <chr>, age <dbl>

Extra exercises

  1. What is the percentage of males who never smoke and frequently exercise? Do the same for female.
# men
none_smoker_sportsmen <- nrow( filter(survey, smokes=="never"  & 
                                              exercise=="freq" & 
                                              gender=="male") )
total_men <- nrow( filter(survey, gender=="male") ) 
none_smoker_sportsmen / total_men
[1] 0.4051724
# women
none_smoker_sportswomen <- nrow( filter(survey, smokes=="never"  & 
                                                exercise=="freq" & 
                                                gender=="female") )
total_women <- nrow( filter(survey, gender=="female") ) 
none_smoker_sportswomen / total_women
[1] 0.3247863
  1. What is the age range in teenagers? You may use the range function (?range).
teenagers <- filter(survey, 10<=age & age<20) 
range(teenagers[["age"]])
[1] 16.750 19.917
  1. How many males do smoke and never exercise? Use ‘%in%’ operator see ?match for more details.
nrow( filter(survey, (smokes %in% c("regul","occas", "heavy")) & exercise=="none" & gender=="male" ) )
[1] 4
  1. Recall where helper function in select extra exercises. Hint: see anyNA function (see ?anyNA for the manual).

    4.1 Select all variables from survey data having missing values.

    select(survey, where(anyNA))
    # A tibble: 233 × 3
       pulse height m.i     
       <dbl>  <dbl> <chr>   
     1    92   173  metric  
     2   104   178. imperial
     3    87    NA  <NA>    
     4    NA   160  metric  
     5    35   165  metric  
     6    64   173. imperial
     7    83   183. imperial
     8    74   157  metric  
     9    72   175  metric  
    10    90   167  metric  
    # ℹ 223 more rows

    4.1 Count the number of missing values for variable you found in 4.1.

    nrow(filter(survey, is.na(pulse)))     # pulse 
    [1] 45
    nrow(filter(survey, is.na(height)))    # height  
    [1] 27
    nrow(filter(survey, is.na(m.i)))       # m.i
    [1] 27

    4.3 In (4.2) you found that height and m.i had the same number of missing values, n=27. How can you confirm or refute that missing values of height and m.i occur in the same observations?

    # The number of observations for which both 'height' and 'm.i' are missing at the same 
    # row must be equal to 27. If true then we can confirm that the missing values of `height` 
    # and `m.i` occur in the same observation, refute otherwise.
    nrow( filter(survey, is.na(height) & is.na(m.i)) ) == 27
    [1] TRUE


Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC