Primary exercises

  1. Manually created factor.
    In a study participants were asked whether their sport activity is none, oncePerWeek, severalPerWeek or daily.
    Build a proper factor for the responses below and store it in a variable w.
    Print the factor.
    Write the code to count the numbers of occurrences of each level and print the counts.
severalPerWeek, none, none, oncePerWeek, oncePerWeek, oncePerWeek, oncePerWeek, ?, none, none
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
  1. A factor with a random content.
    Read help about the function sample.
    Then study and try the following lines of code to understand the results.
    Next, understand why an error is generated and use replace argument to generate a vector with 100 samples.
    Store this vector in a variable v and build a factor w from it.
    Finally, count the numbers of occurrences of each level in w.
    Ensure, that levels are in order provided in the variable lvs.
lvs <- c( "none", "oncePerWeek", "severalPerWeek", "daily" )
sample( lvs, 3 )
[1] "severalPerWeek" "daily"          "none"          
sample( lvs, 3 )
[1] "oncePerWeek" "none"        "daily"      
sample( lvs, 3 )
[1] "severalPerWeek" "oncePerWeek"    "daily"         
sample( lvs, 100 )
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
v <- sample( lvs, 100, replace = TRUE )
w <- factor( v, levels = lvs )
w
  [1] severalPerWeek oncePerWeek    oncePerWeek    none           daily          oncePerWeek    severalPerWeek oncePerWeek    severalPerWeek oncePerWeek   
 [11] none           severalPerWeek severalPerWeek oncePerWeek    daily          severalPerWeek oncePerWeek    none           none           none          
 [21] none           severalPerWeek oncePerWeek    daily          severalPerWeek daily          severalPerWeek daily          severalPerWeek oncePerWeek   
 [31] severalPerWeek daily          none           oncePerWeek    severalPerWeek severalPerWeek daily          none           severalPerWeek none          
 [41] oncePerWeek    oncePerWeek    oncePerWeek    none           none           severalPerWeek daily          oncePerWeek    oncePerWeek    severalPerWeek
 [51] oncePerWeek    oncePerWeek    severalPerWeek oncePerWeek    severalPerWeek none           none           severalPerWeek none           oncePerWeek   
 [61] oncePerWeek    none           none           severalPerWeek daily          none           severalPerWeek daily          severalPerWeek none          
 [71] severalPerWeek none           daily          daily          oncePerWeek    oncePerWeek    severalPerWeek severalPerWeek severalPerWeek oncePerWeek   
 [81] none           daily          severalPerWeek none           none           severalPerWeek severalPerWeek none           oncePerWeek    severalPerWeek
 [91] oncePerWeek    none           daily          none           severalPerWeek none           none           daily          daily          daily         
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 none              27
2 oncePerWeek       25
3 severalPerWeek    31
4 daily             17
  1. Reordering factor levels.
    When a factor is shown on an axis of a plot, the order is given by its levels.
    The factor w from the previous exercise will be then shown in this order: none, oncePerWeek, severalPerWeek, daily.
    But for a picture in a manuscript the following order might be needed: daily, severalPerWeek, oncePerWeek, none.
    Apply to w one of the fct_ functions from the tidyverse library to produce a factor w2 with the requested order.
    Show the levels of w2.
    Again show the number of elements of each level in w2 and compare it with the table of the previous exercise.
w2 <- fct_relevel( w, c( "daily", "severalPerWeek", "oncePerWeek", "none" ) )
levels( w2 )
[1] "daily"          "severalPerWeek" "oncePerWeek"    "none"          
fct_count( w2 )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 daily             17
2 severalPerWeek    31
3 oncePerWeek       25
4 none              27

Extra exercises

  1. Counting with table(); getting counts for single levels.
    The fct_count() is a tidyverse/forcats function for counting factor elements and produces the result in a form of a table (the tibble object).
    The table() function from base-R provides a similar functionality but returns the result in another format.
    Reuse the factor w from the first primary exercise.
    Try table( w ) and compare its output with fct_count( w ).
    Store the counts as follows cnts <- table( w ). Use square brackets on cnts to get the count of oncePerWeek.
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek    oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
table( w )
w
          none    oncePerWeek severalPerWeek          daily 
             4              4              1              0 
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
cnts <- table( w )
cnts[ "oncePerWeek" ]
oncePerWeek 
          4 
  1. Special ordering of levels.
    ➡️Go to forcats cheat sheet to find how to order the factor by the frequency of occurrences.
    Reuse w from the previous exercise and construct a factor w3 with the same values and with the levels sorted by descending number of occurrences.
    Count the occurrences to demonstrate correctness.
    Now, find a way to sort the levels in the increasing order.
w3 <- fct_infreq( w )
fct_count( w3 )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
fct_count( fct_rev( w3 ) )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 daily              0
2 severalPerWeek     1
3 oncePerWeek        4
4 none               4
5 <NA>               1


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC