Primary exercises

  1. Manually created factor.
    In a study participants were asked whether their sport activity is none, oncePerWeek, severalPerWeek or daily.
    Build a proper factor for the responses below and store it in a variable w.
    Print the factor.
    Write the code to count the numbers of occurrences of each level and print the counts.
severalPerWeek, none, none, oncePerWeek, oncePerWeek, oncePerWeek, oncePerWeek, ?, none, none
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek   
 [7] oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
  1. A factor with a random content.
    Read help about the function sample.
    Then study and try the following lines of code to understand the results.
    Next, understand why an error is generated and use replace argument to generate a vector with 100 samples.
    Store this vector in a variable v and build a factor w from it.
    Finally, count the numbers of occurrences of each level in w.
    Ensure, that levels are in order provided in the variable lvs.
lvs <- c( "none", "oncePerWeek", "severalPerWeek", "daily" )
sample( lvs, 3 )
[1] "severalPerWeek" "none"           "daily"         
sample( lvs, 3 )
[1] "oncePerWeek"    "none"           "severalPerWeek"
sample( lvs, 3 )
[1] "severalPerWeek" "none"           "oncePerWeek"   
sample( lvs, 100 )
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
v <- sample( lvs, 100, replace = TRUE )
w <- factor( v, levels = lvs )
w
  [1] severalPerWeek oncePerWeek    none           oncePerWeek    none           severalPerWeek
  [7] none           oncePerWeek    none           severalPerWeek severalPerWeek daily         
 [13] oncePerWeek    oncePerWeek    oncePerWeek    daily          daily          none          
 [19] daily          none           oncePerWeek    none           severalPerWeek none          
 [25] severalPerWeek daily          none           oncePerWeek    oncePerWeek    daily         
 [31] none           severalPerWeek severalPerWeek none           none           severalPerWeek
 [37] oncePerWeek    daily          daily          oncePerWeek    oncePerWeek    none          
 [43] severalPerWeek oncePerWeek    none           oncePerWeek    none           daily         
 [49] daily          severalPerWeek none           daily          none           oncePerWeek   
 [55] daily          none           daily          daily          oncePerWeek    daily         
 [61] daily          daily          daily          none           none           severalPerWeek
 [67] none           severalPerWeek daily          severalPerWeek daily          oncePerWeek   
 [73] oncePerWeek    daily          none           severalPerWeek daily          severalPerWeek
 [79] none           daily          oncePerWeek    daily          severalPerWeek daily         
 [85] severalPerWeek oncePerWeek    oncePerWeek    oncePerWeek    none           daily         
 [91] none           none           severalPerWeek oncePerWeek    none           severalPerWeek
 [97] none           daily          none           oncePerWeek   
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 none              29
2 oncePerWeek       24
3 severalPerWeek    20
4 daily             27
  1. Reordering factor levels.
    When a factor is shown on an axis of a plot, the order is given by its levels.
    The factor w from the previous exercise will be then shown in this order: none, oncePerWeek, severalPerWeek, daily.
    But for a picture in a manuscript the following order might be needed: daily, severalPerWeek, oncePerWeek, none.
    Apply to w one of the fct_ functions from the tidyverse library to produce a factor w2 with the requested order.
    Show the levels of w2.
    Again show the number of elements of each level in w2 and compare it with the table of the previous exercise.
w2 <- fct_relevel( w, c( "daily", "severalPerWeek", "oncePerWeek", "none" ) )
levels( w2 )
[1] "daily"          "severalPerWeek" "oncePerWeek"    "none"          
fct_count( w2 )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 daily             27
2 severalPerWeek    20
3 oncePerWeek       24
4 none              29

Extra exercises

  1. Counting with table(); getting counts for single levels.
    The fct_count() is a tidyverse/forcats function for counting factor elements and produces the result in a form of a table (the tibble object).
    The table() function from base-R provides a similar functionality but returns the result in another format.
    Reuse the factor w from the first primary exercise.
    Try table( w ) and compare its output with fct_count( w ).
    Store the counts as follows cnts <- table( w ). Use square brackets on cnts to get the count of oncePerWeek.
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek   
 [7] oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
table( w )
w
          none    oncePerWeek severalPerWeek          daily 
             4              4              1              0 
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
cnts <- table( w )
cnts[ "oncePerWeek" ]
oncePerWeek 
          4 
  1. Special ordering of levels.
    ➡️Go to forcats cheat sheet to find how to order the factor by the frequency of occurrences.
    Reuse w from the previous exercise and construct a factor w3 with the same values and with the levels sorted by descending number of occurrences.
    Count the occurrences to demonstrate correctness.
    Now, find a way to sort the levels in the increasing order.
w3 <- fct_infreq( w )
fct_count( w3 )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
fct_count( fct_rev( w3 ) )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 daily              0
2 severalPerWeek     1
3 oncePerWeek        4
4 none               4
5 <NA>               1


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC