Primary exercises

  1. Manually created factor.
    In a study participants were asked whether their sport activity is none, oncePerWeek, severalPerWeek or daily.
    Build a proper factor for the responses below and store it in a variable w.
    Print the factor.
    Write the code to count the numbers of occurrences of each level and print the counts.
severalPerWeek, none, none, oncePerWeek, oncePerWeek, oncePerWeek, oncePerWeek, ?, none, none
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
  1. A factor with a random content.
    Read help about the function sample.
    Then study and try the following lines of code to understand the results.
    Next, understand why an error is generated and use replace argument to generate a vector with 100 samples.
    Store this vector in a variable v and build a factor w from it.
    Finally, count the numbers of occurrences of each level in w.
    Ensure, that levels are in order provided in the variable lvs.
lvs <- c( "none", "oncePerWeek", "severalPerWeek", "daily" )
sample( lvs, 3 )
[1] "daily"       "none"        "oncePerWeek"
sample( lvs, 3 )
[1] "none"        "daily"       "oncePerWeek"
sample( lvs, 3 )
[1] "daily"          "severalPerWeek" "none"          
sample( lvs, 100 )
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
v <- sample( lvs, 100, replace = TRUE )
w <- factor( v, levels = lvs )
w
  [1] severalPerWeek none           daily          none           oncePerWeek    severalPerWeek none           oncePerWeek    oncePerWeek    daily         
 [11] severalPerWeek daily          oncePerWeek    none           oncePerWeek    daily          severalPerWeek daily          none           daily         
 [21] none           none           daily          none           none           daily          none           severalPerWeek oncePerWeek    oncePerWeek   
 [31] severalPerWeek none           none           none           severalPerWeek oncePerWeek    none           none           daily          daily         
 [41] none           severalPerWeek severalPerWeek daily          oncePerWeek    none           none           daily          severalPerWeek daily         
 [51] daily          none           none           oncePerWeek    none           severalPerWeek severalPerWeek severalPerWeek severalPerWeek none          
 [61] severalPerWeek severalPerWeek oncePerWeek    none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    daily          severalPerWeek
 [71] severalPerWeek daily          severalPerWeek severalPerWeek oncePerWeek    oncePerWeek    severalPerWeek daily          daily          severalPerWeek
 [81] severalPerWeek daily          none           daily          daily          none           severalPerWeek daily          severalPerWeek daily         
 [91] daily          daily          daily          none           severalPerWeek severalPerWeek daily          daily          none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 none              28
2 oncePerWeek       17
3 severalPerWeek    27
4 daily             28
  1. Reordering factor levels.
    When a factor is shown on an axis of a plot, the order is given by its levels.
    The factor w from the previous exercise will be then shown in this order: none, oncePerWeek, severalPerWeek, daily.
    But for a picture in a manuscript the following order might be needed: daily, severalPerWeek, oncePerWeek, none.
    Apply to w one of the fct_ functions from the tidyverse library to produce a factor w2 with the requested order.
    Show the levels of w2.
    Again show the number of elements of each level in w2 and compare it with the table of the previous exercise.
w2 <- fct_relevel( w, c( "daily", "severalPerWeek", "oncePerWeek", "none" ) )
levels( w2 )
[1] "daily"          "severalPerWeek" "oncePerWeek"    "none"          
fct_count( w2 )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 daily             28
2 severalPerWeek    27
3 oncePerWeek       17
4 none              28

Extra exercises

  1. Counting with table(); getting counts for single levels.
    The fct_count() is a tidyverse/forcats function for counting factor elements and produces the result in a form of a table (the tibble object).
    The table() function from base-R provides a similar functionality but returns the result in another format.
    Reuse the factor w from the first primary exercise.
    Try table( w ) and compare its output with fct_count( w ).
    Store the counts as follows cnts <- table( w ). Use square brackets on cnts to get the count of oncePerWeek.
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
table( w )
w
          none    oncePerWeek severalPerWeek          daily 
             4              4              1              0 
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
cnts <- table( w )
cnts[ "oncePerWeek" ]
oncePerWeek 
          4 
  1. Special ordering of levels.
    ➡️Go to forcats cheat sheet to find how to order the factor by the frequency of occurrences.
    Reuse w from the previous exercise and construct a factor w3 with the same values and with the levels sorted by descending number of occurrences.
    Count the occurrences to demonstrate correctness.
    Now, find a way to sort the levels in the increasing order.
w3 <- fct_infreq( w )
fct_count( w3 )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
fct_count( fct_rev( w3 ) )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 daily              0
2 severalPerWeek     1
3 oncePerWeek        4
4 none               4
5 <NA>               1


Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC