Primary exercises
- Create and investigate a list.
 Three students received different sets of grades (Amy: 1,6,7,9,10; Bob:
6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
 In a variablescorescreate alist(the names
of the list elements should be the names of the students and the values
should be the corresponding grades).
 Print the list, itsclass,lengthand
structure (str) ofscores.
scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)
scores
$Amy
[1]  1  6  7  9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1]  9  9 10
class( scores )
[1] "list"
length( scores )
[1] 3
str( scores )
List of 3
 $ Amy: num [1:5] 1 6 7 9 10
 $ Bob: num [1:9] 6 7 4 3 5 2 2 1 4
 $ Dan: num [1:3] 9 9 10
- Add an element, change an element.
 Reusescoresfrom the previous exercise.
 Add there grades for Eve (7,3,5,8,8,9) and print the list.
 Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint:
use the combine functioncto combine existing Dan’s grades
with the new grades then put the result back toscores; do
not type again9,9,10).
scores[[ 'Eve' ]] <- c(7,3,5,8,8,9)
scores
$Amy
[1]  1  6  7  9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1]  9  9 10
$Eve
[1] 7 3 5 8 8 9
scores[[ "Dan" ]] <- c( scores[[ "Dan" ]], c(8,8,6,7) )
scores
$Amy
[1]  1  6  7  9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1]  9  9 10  8  8  6  7
$Eve
[1] 7 3 5 8 8 9
- Single and double bracket operators.
 Reusescoresfrom the previous exercises.
 Investigate the difference betweenscores[[ "Bob" ]]andscores[ "Bob" ].
 Look at what is printed and what is the class of each result.
 Then comparescores[[ c( "Amy", "Bob" ) ]]withscores[ c( "Amy", "Bob" ) ].
 Understand, why the error is reported.
scores[[ "Bob" ]]             # Returns the value of Bob element (vector)
[1] 6 7 4 3 5 2 2 1 4
scores[ "Bob" ]               # Creates a new list with only Bob there (list)
$Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores[[ "Bob" ]] )
[1] "numeric"
class( scores[ "Bob" ] )
[1] "list"
scores[[ c( "Amy", "Bob" ) ]] # A list is needed to return two elements
Error in scores[[c("Amy", "Bob")]]: subscript out of bounds
scores[ c( "Amy", "Bob" ) ]   # This creates a list, so many elements are ok
$Amy
[1]  1  6  7  9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
- Dollar operator.
 Reusescoresfrom the previous exercises.
 Investigate the (lack of) difference betweenscores$Bobandscores[[ "Bob" ]].
 Look at what is printed and what is the class of each result.
 Then comparescores$Bowithscores[[ "Bo" ]].
 Understand, why theNULLis returned.
scores$Bob        # another way to access Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bob" ]] # get an element with exact name Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores$Bob )
[1] "numeric"
class( scores[[ "Bob" ]] )
[1] "numeric"
scores$Bo         # strange matching of names, it still finds Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bo" ]]  # there is no "Bo" so NULL is returned
NULL
 
Multitopic exercises
- (ADV) Mean grades for each student. (Call a function for each
element. Collect calls’ results into list.)
 Consider thescoreslist from the first exercise (also
copied below).
 Calculate the mean grade for each student.
 
 Uselapplyto apply themeanfunction to each
element ofscores.
 Also, replacelapplywithsapplyand compare
the results.
 Try to explain whatlapply/sapplydo.
 Note: the names of the list elements inscoresare
preserved in the result.
 
 
scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)
lapply( scores, mean ) # the result is a list
$Amy
[1] 6.6
$Bob
[1] 3.777778
$Dan
[1] 9.333333
sapply( scores, mean ) # the result is converted to a vector
     Amy      Bob      Dan 
6.600000 3.777778 9.333333 
- (ADV) Simulate grades. (Define an own function and call it for
each element.)
 Consider thescoreslist from the previous exercise.
 Let’s assume that the grades are not known yet and we need to simulate
them.
 
 A vectornmswith several (e.g. 12, see below) unique names
of students is provided.
 Each student should have a random number of grades (between 5 and
14).
 The grades should be sampled from the range1:10.
 Grades1-4,9,10are usually rare compared to6-8, so the probabilities of grades should not be uniform
(e.g. the ratios should be 1:1:1:1:2:10:20:20:2:1 for grades
1…10).
 For each student, the grades should be sorted in ascending order.
 The final list should have the same structure asscores(i.e. the names of the list elements should be the names of the students
and the values should be the corresponding grades).
 
 Hints:
- Use sampleto generate a random number - how many
grades a student should have.
- Use samplewith theprobandreplacearguments - grades with non-uniform
probabilities.
- Put above into a function genGradesthat generates
grades for a single student.
- Use lapplyto apply the function to each element ofnms. Note, that the function does not use thenmargument (but it still needs to be present).
 
- Use setNamesto assign names to the list elements (or
better name the elements ofnmsbeforelapply).
 
nms <- c( "Amy", "Bob", "Carl", "Dany", "Ewa", "Frank", "Greg", "Holy", "Ian", "Jan", "Kees", "Leon" )
genGrades <- function(nm) { # nm is a single name, not used in the function
  gradesNum <- sample( 5:14, 1 )
  grades <- sort( sample( 1:10, size = gradesNum, prob = c(1,1,1,1,2,10,20,20,2,1), replace = TRUE ) )
  return( grades )
}
lapply( setNames( nm = nms ), genGrades ) # calls genGrades for each element of nms
$Amy
 [1] 6 6 7 7 7 7 7 8 8 8 8 8 9
$Bob
 [1] 1 6 6 7 7 7 8 8 8 8
$Carl
 [1] 4 5 6 6 7 7 7 7 8 8 8
$Dany
 [1] 1 2 5 5 6 6 6 7 7 7 8 8 8 9
$Ewa
 [1] 6 6 6 7 7 8 8 8 8 8 8 8 8
$Frank
[1] 6 8 8 8 8 8 8 9
$Greg
 [1] 1 3 5 6 6 7 7 7 7 8 8 8 8 8
$Holy
 [1] 6 6 7 7 7 7 7 8 8 8 9
$Ian
 [1] 3 7 7 7 7 7 7 7 7 8 8 8 8
$Jan
[1]  7  7  7  8  8  8  8 10
$Kees
 [1] 6 6 7 7 7 7 8 8 8 8 9
$Leon
 [1]  1  5  6  6  6  7  7  7  8  8  8  9 10
                                          # if elements of nms have names, the result has the same names
- (ADV) Plot scores given in a list. (Convert list to long tibble.
Plot it.)
 Plotting functions usually require a table with data in a long
format.
 Convert thescoreslist from the first exercise to a long
table, with two columnsnameandscore(each
grade should be a separate row).
 Useggplotto plot the grades from the long table.
 
 Hints:
- Write a function which converts a single element of
scoresto atibblewith two columnsnameandscore.
- Use lapplyto apply the function to each element ofscores(you will get a list of tibbles).
- Use bind_rowsto combine the results into a single
table (you will get a single, merged tibble).
- Use ggplotto plot the table. The example below usesgeom_dotplotto plot the grades. You may usegeom_pointinstead.
 
scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)
d <- names(scores) %>% 
  lapply( function( nm ) tibble( name=nm, score=scores[[nm]] ) ) %>% 
  bind_rows()
p <- ggplot( d ) +
  aes( x=name, y=score ) +
  geom_dotplot( binaxis="y", stackdir="center", binwidth=0.5 ) +
  theme_bw() +
  scale_y_continuous( limits=c(1,10), breaks=1:10 )

- (ADV) Split a table into list of tables by a column factor;
merge back.
 Some functions might require an input to be provided as a list of
tables.
 Let’s assume that thepulsetable should be split into a
list of table parts based on theexerciseargument.
 Load thepulse.csvdata to variablepulse.
 Tryl <- pulse %>% split( .$exercise )and
investigate theclass,lengthandnamesof the resultl.
 Use double square bracket to extract the part forexercisebeinglow.
 Finally, check that withbind_rowsapplied tolyou can recreate thepulsetable (but with a
different order of rows).
l <- pulse %>% split( .$exercise )  # . represents the object on the left side of %>%
class( l )
[1] "list"
length( l )
[1] 3
names( l )
[1] "high"     "low"      "moderate"
l[[ "low" ]]
# A tibble: 37 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993
 2 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993
 3 1993_L Frederick    178     58    19 male   no     no      low      sat       74     76  1993
 4 1993_P Mathew       185    110    22 male   no     yes     low      sat       77     73  1993
 5 1993_Q Leslie       170     56    19 male   no     no      low      sat       64     63  1993
 6 1993_U Jerome       175     60    19 male   no     no      low      sat       88     86  1993
 7 1993_V Arlene       140     50    34 female no     no      low      ran       70     98  1993
 8 1993_W Glenna       163     55    20 female no     no      low      sat       78     74  1993
 9 1995_B Olga         172     60    21 female no     no      low      sat       81     79  1995
10 1995_H Eliza        164     66    23 female no     no      low      ran       74    168  1995
# ℹ 27 more rows
recreatedPulse <- bind_rows( l )
dim( pulse )
[1] 110  13
dim( recreatedPulse )
[1] 110  13
- (ADV) Split a table by a column and write each part to a
different file.
 Continue with the setup of the previous exercise.
 Study/type/exectute the following example.
 Find the newly created files in your filesystem.
l <- pulse %>% split( .$exercise )
exercises <- names( l )                   # name in l of each table chunk
for( exercise in exercises ) {            # exercise will be a name of a single chunk
  fileName <- paste0( "pulse_", exercise, ".csv" )  # name of the file for the chunk
  message( "Writing file '", fileName, "'..." )
  write_csv( l[[ exercise ]], file = fileName )
}
 
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC