Lists (practice)

Primary exercises

Create and investigate a list.
Three students received different sets of grades (Amy: 1,6,7,9,10; Bob: 6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
In a variable scores create a list (the names of the list elements should be the names of the students and the values should be the corresponding grades).
Print the list, its class, length and structure (str) of scores.
Add an element, change an element.
Reuse scores from the previous exercise.
Add there grades for Eve (7,3,5,8,8,9) and print the list.
Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint: use the combine function c to combine existing Dan’s grades with the new grades then put the result back to scores; do not type again 9,9,10).
Single and double bracket operators.
Reuse scores from the previous exercises.
Investigate the difference between scores[[ "Bob" ]] and scores[ "Bob" ].
Look at what is printed and what is the class of each result.
Then compare scores[[ c( "Amy", "Bob" ) ]] with scores[ c( "Amy", "Bob" ) ].
Understand, why the error is reported.
Dollar operator.
Reuse scores from the previous exercises.
Investigate the (lack of) difference between scores$Bob and scores[[ "Bob" ]].
Look at what is printed and what is the class of each result.
Then compare scores$Bo with scores[[ "Bo" ]].
Understand, why the NULL is returned.

Extra exercises

A list returned by a function; test for association/correlation.
For this exercise we need two random numerical vectors.
Let’s create x and y, each of 30 elements sampled from the normal distribution: x <- rnorm( 30 ) and y <- rnorm( 30 ).
Print these vectors. You may also produce a scatter plot: plot( x, y ).

The function cor.test tests for association between corresponding elements of two vectors.
Use h <- cor.test( x, y ) and print h to see a report of the association test.
Internally h is stored as a list. Print names of the elements stored in h.
Now, read Help for cor.test. In the section Value you will see the description of the h elements.
Get directy the values of elements estimate and p.value.
A nested list.
Let’s extend the concept of scores to describe various topics (see the code below).
Check class and str of scores.
Calculate how many students are in the scores list.
Get Dan’s scores in physics.

scores <- list(
  Amy = list(
    math = c( 1,6,7,9,10 ),
    biology = c( 7,6,8 )
  ),
  Bob = list(
    math = c( 6,7,4,3,5,2,2,1,4 ),
    physics = c( 8,7 )
  ),
  Dan = list(
    math = c( 9,9,10 ),
    physics = c( 10, 10, 10 ),
    biology = c( 3, 5, 7 )
  )
)

Multitopic exercises

(ADV) Mean grades for each student. (Call a function for each element. Collect calls’ results into list.)
Consider the scores list from the first exercise (also copied below).
Calculate the mean grade for each student.

Use lapply to apply the mean function to each element of scores.
Also, replace lapply with sapply and compare the results.
Try to explain what lapply/sapply do.
Note: the names of the list elements in scores are preserved in the result.

scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)

(ADV) Simulate grades. (Define an own function and call it for each element.)
Consider the scores list from the previous exercise.
Let’s assume that the grades are not known yet and we need to simulate them.

A vector nms with several (e.g. 12, see below) unique names of students is provided.
Each student should have a random number of grades (between 5 and 14).
The grades should be sampled from the range 1:10.
Grades 1-4,9,10 are usually rare compared to 6-8, so the probabilities of grades should not be uniform (e.g. the ratios should be 1:1:1:1:2:10:20:20:2:1 for grades 1…10).
For each student, the grades should be sorted in ascending order.
The final list should have the same structure as scores (i.e. the names of the list elements should be the names of the students and the values should be the corresponding grades).

Hints:
- Use sample to generate a random number - how many grades a student should have.
- Use sample with the prob and replace arguments - grades with non-uniform probabilities.
- Put above into a function genGrades that generates grades for a single student.
- Use lapply to apply the function to each element of nms. Note, that the function does not use the nm argument (but it still needs to be present).
- Use setNames to assign names to the list elements (or better name the elements of nms before lapply).

nms <- c( "Amy", "Bob", "Carl", "Dany", "Ewa", "Frank", "Greg", "Holy", "Ian", "Jan", "Kees", "Leon" )

(ADV) Plot scores given in a list. (Convert list to long tibble. Plot it.)
Plotting functions usually require a table with data in a long format.
Convert the scores list from the first exercise to a long table, with two columns name and score (each grade should be a separate row).
Use ggplot to plot the grades from the long table.

Hints:
- Write a function which converts a single element of scores to a tibble with two columns name and score.
- Use lapply to apply the function to each element of scores (you will get a list of tibbles).
- Use bind_rows to combine the results into a single table (you will get a single, merged tibble).
- Use ggplot to plot the table. The example below uses geom_dotplot to plot the grades. You may use geom_point instead.

(ADV) Split a table into list of tables by a column factor; merge back.
Some functions might require an input to be provided as a list of tables.
Let’s assume that the pulse table should be split into a list of table parts based on the exercise argument.
Load the pulse.csv data to variable pulse.
Try l <- pulse %>% split( .$exercise ) and investigate the class, length and names of the result l.
Use double square bracket to extract the part for exercise being low.
Finally, check that with bind_rows applied to l you can recreate the pulse table (but with a different order of rows).
(ADV) Split a table by a column and write each part to a different file.
Continue with the setup of the previous exercise.
Study/type/exectute the following example.
Find the newly created files in your filesystem.

l <- pulse %>% split( .$exercise )
exercises <- names( l )                   # name in l of each table chunk
for( exercise in exercises ) {            # exercise will be a name of a single chunk
  fileName <- paste0( "pulse_", exercise, ".csv" )  # name of the file for the chunk
  message( "Writing file '", fileName, "'..." )
  write_csv( l[[ exercise ]], file = fileName )
}

↑ Lecture ⇄ Solutions