Lists (solutions)

Primary exercises

Create and investigate a list.
Three students received different sets of grades (Amy: 1,6,7,9,10; Bob: 6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
In a variable scores create a list (the names of the list elements should be the names of the students and the values should be the corresponding grades).
Print the list, its class, length and structure (str) of scores.

scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)
scores

$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10

class( scores )

[1] "list"

length( scores )

[1] 3

str( scores )

List of 3
 $ Amy: num [1:5] 1 6 7 9 10
 $ Bob: num [1:9] 6 7 4 3 5 2 2 1 4
 $ Dan: num [1:3] 9 9 10

Add an element, change an element.
Reuse scores from the previous exercise.
Add there grades for Eve (7,3,5,8,8,9) and print the list.
Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint: use the combine function c to combine existing Dan’s grades with the new grades then put the result back to scores; do not type again 9,9,10).

scores[[ 'Eve' ]] <- c(7,3,5,8,8,9)
scores

$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10

$Eve
[1] 7 3 5 8 8 9

scores[[ "Dan" ]] <- c( scores[[ "Dan" ]], c(8,8,6,7) )
scores

$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10  8  8  6  7

$Eve
[1] 7 3 5 8 8 9

Single and double bracket operators.
Reuse scores from the previous exercises.
Investigate the difference between scores[[ "Bob" ]] and scores[ "Bob" ].
Look at what is printed and what is the class of each result.
Then compare scores[[ c( "Amy", "Bob" ) ]] with scores[ c( "Amy", "Bob" ) ].
Understand, why the error is reported.

scores[[ "Bob" ]]             # Returns the value of Bob element (vector)

[1] 6 7 4 3 5 2 2 1 4

scores[ "Bob" ]               # Creates a new list with only Bob there (list)

$Bob
[1] 6 7 4 3 5 2 2 1 4

class( scores[[ "Bob" ]] )

[1] "numeric"

class( scores[ "Bob" ] )

[1] "list"

scores[[ c( "Amy", "Bob" ) ]] # A list is needed to return two elements

Error in scores[[c("Amy", "Bob")]]: subscript out of bounds

scores[ c( "Amy", "Bob" ) ]   # This creates a list, so many elements are ok

$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

Dollar operator.
Reuse scores from the previous exercises.
Investigate the (lack of) difference between scores$Bob and scores[[ "Bob" ]].
Look at what is printed and what is the class of each result.
Then compare scores$Bo with scores[[ "Bo" ]].
Understand, why the NULL is returned.

scores$Bob        # another way to access Bob

[1] 6 7 4 3 5 2 2 1 4

scores[[ "Bob" ]] # get an element with exact name Bob

[1] 6 7 4 3 5 2 2 1 4

class( scores$Bob )

[1] "numeric"

class( scores[[ "Bob" ]] )

[1] "numeric"

scores$Bo         # strange matching of names, it still finds Bob

[1] 6 7 4 3 5 2 2 1 4

scores[[ "Bo" ]]  # there is no "Bo" so NULL is returned

NULL

Extra exercises

A list returned by a function; test for association/correlation.
For this exercise we need two random numerical vectors.
Let’s create x and y, each of 30 elements sampled from the normal distribution: x <- rnorm( 30 ) and y <- rnorm( 30 ).
Print these vectors. You may also produce a scatter plot: plot( x, y ).

The function cor.test tests for association between corresponding elements of two vectors.
Use h <- cor.test( x, y ) and print h to see a report of the association test.
Internally h is stored as a list. Print names of the elements stored in h.
Now, read Help for cor.test. In the section Value you will see the description of the h elements.
Get directy the values of elements estimate and p.value.

x <- rnorm( 30 )
y <- rnorm( 30 )
x

 [1] -2.720957077 -0.372663268  0.684618249  1.392614043 -0.991253368  0.297300096 -1.101879397
 [8] -0.511640901  0.263100005 -0.567268060  1.100591256 -0.976986395 -0.368414885 -0.209355565
[15]  0.690154102  1.882702183 -0.206012529 -1.137255756  0.237111371  0.338988872  1.451408479
[22]  1.333525402  1.842815049  0.439526488  1.963202714  0.511570622  0.008752512 -0.999833631
[29]  1.518302112  0.015168720

 [1] -0.13560269  1.78822543  0.23079145  1.76195061 -0.25540816  0.49524575 -1.52069196  1.53432428
 [9] -0.41753800 -0.31442038  1.76213001 -0.60441306 -0.01839972  0.50567653 -0.14846904  0.30299474
[17] -0.83285526  2.20474121 -0.26157659 -0.61074251  0.35503815  0.34309638  0.49344440  1.60647689
[25] -1.37200446  2.03864595  0.04269505 -0.81986750  0.52317465 -1.22120615

plot( x, y )

h <- cor.test( x, y )
h


    Pearson's product-moment correlation

data:  x and y
t = 0.82588, df = 28, p-value = 0.4159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2181819  0.4873997
sample estimates:
      cor 
0.1542088

names( h )

[1] "statistic"   "parameter"   "p.value"     "estimate"    "null.value"  "alternative" "method"     
[8] "data.name"   "conf.int"

h[[ 'estimate' ]]

      cor 
0.1542088

h[[ 'p.value' ]]

[1] 0.4158567

A nested list.
Let’s extend the concept of scores to describe various topics (see the code below).
Check class and str of scores.
Calculate how many students are in the scores list.
Get Dan’s scores in physics.

scores <- list(
  Amy = list(
    math = c( 1,6,7,9,10 ),
    biology = c( 7,6,8 )
  ),
  Bob = list(
    math = c( 6,7,4,3,5,2,2,1,4 ),
    physics = c( 8,7 )
  ),
  Dan = list(
    math = c( 9,9,10 ),
    physics = c( 10, 10, 10 ),
    biology = c( 3, 5, 7 )
  )
)

class( scores )

[1] "list"

str( scores )

List of 3
 $ Amy:List of 2
  ..$ math   : num [1:5] 1 6 7 9 10
  ..$ biology: num [1:3] 7 6 8
 $ Bob:List of 2
  ..$ math   : num [1:9] 6 7 4 3 5 2 2 1 4
  ..$ physics: num [1:2] 8 7
 $ Dan:List of 3
  ..$ math   : num [1:3] 9 9 10
  ..$ physics: num [1:3] 10 10 10
  ..$ biology: num [1:3] 3 5 7

length( scores )      # number of students

[1] 3

length( scores$Bob )  # number of topics for which Bob has scores

[1] 2

scores[[ "Dan" ]][[ "physics" ]]

[1] 10 10 10

scores$Dan$physics

[1] 10 10 10

scores$Dan[[ "physics" ]]

[1] 10 10 10

Multitopic exercises

(ADV) Split a table into list of tables by a column factor; merge back.
Some functions might require an input to be provided as a list of tables.
Let’s assume that the pulse table should be split into a list of table parts based on the exercise argument.
Load the pulse.csv data to variable pulse.
Try l <- pulse %>% split( .$exercise ) and investigate the class, length and names of the result l.
Use double square bracket to extract the part for exercise being low.
Finally, check that with bind_rows applied to l you can recreate the pulse table (but with a different order of rows).

l <- pulse %>% split( .$exercise )  # . represents the object on the left side of %>%
class( l )

[1] "list"

length( l )

[1] 3

names( l )

[1] "high"     "low"      "moderate"

l[[ "low" ]]

# A tibble: 37 × 13
   id     name      height weight   age gender smokes alcohol exercise ran   pulse1 pulse2  year
   <chr>  <chr>      <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>    <chr>  <dbl>  <dbl> <dbl>
 1 1993_E Lauri        173     64    18 female no     yes     low      sat       90     88  1993
 2 1993_F George       184     74    22 male   no     yes     low      ran       78    141  1993
 3 1993_L Frederick    178     58    19 male   no     no      low      sat       74     76  1993
 4 1993_P Mathew       185    110    22 male   no     yes     low      sat       77     73  1993
 5 1993_Q Leslie       170     56    19 male   no     no      low      sat       64     63  1993
 6 1993_U Jerome       175     60    19 male   no     no      low      sat       88     86  1993
 7 1993_V Arlene       140     50    34 female no     no      low      ran       70     98  1993
 8 1993_W Glenna       163     55    20 female no     no      low      sat       78     74  1993
 9 1995_B Olga         172     60    21 female no     no      low      sat       81     79  1995
10 1995_H Eliza        164     66    23 female no     no      low      ran       74    168  1995
# … with 27 more rows

recreatedPulse <- bind_rows( l )
dim( pulse )

[1] 110  13

dim( recreatedPulse )

[1] 110  13

(ADV) Split a table by a column and write each part to a different file.
Continue with the setup of the previous exercise.
Study/type/exectute the following example.
Find the newly created files in your filesystem.

l <- pulse %>% split( .$exercise )
exercises <- names( l )                   # name in l of each table chunk
for( exercise in exercises ) {            # exercise will be a name of a single chunk
  fileName <- paste0( "pulse_", exercise, ".csv" )  # name of the file for the chunk
  message( "Writing file '", fileName, "'..." )
  write_csv( l[[ exercise ]], file = fileName )
}

↑ Lecture ⇄ Practice