Primary exercises
- Create and investigate a list.
Three students received different sets of grades (Amy: 1,6,7,9,10; Bob:
6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
In a variable scores
create a list
(the names
of the list elements should be the names of the students and the values
should be the corresponding grades).
Print the list, its class
, length
and
structure (str
) of scores
.
scores <- list(
Amy = c( 1,6,7,9,10 ),
Bob = c( 6,7,4,3,5,2,2,1,4 ),
Dan = c( 9,9,10 )
)
scores
$Amy
[1] 1 6 7 9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1] 9 9 10
class( scores )
[1] "list"
length( scores )
[1] 3
str( scores )
List of 3
$ Amy: num [1:5] 1 6 7 9 10
$ Bob: num [1:9] 6 7 4 3 5 2 2 1 4
$ Dan: num [1:3] 9 9 10
- Add an element, change an element.
Reuse scores
from the previous exercise.
Add there grades for Eve (7,3,5,8,8,9) and print the list.
Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint:
use the combine function c
to combine existing Dan’s grades
with the new grades then put the result back to scores
; do
not type again 9,9,10
).
scores[[ 'Eve' ]] <- c(7,3,5,8,8,9)
scores
$Amy
[1] 1 6 7 9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1] 9 9 10
$Eve
[1] 7 3 5 8 8 9
scores[[ "Dan" ]] <- c( scores[[ "Dan" ]], c(8,8,6,7) )
scores
$Amy
[1] 1 6 7 9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
$Dan
[1] 9 9 10 8 8 6 7
$Eve
[1] 7 3 5 8 8 9
- Single and double bracket operators.
Reuse scores
from the previous exercises.
Investigate the difference between scores[[ "Bob" ]]
and
scores[ "Bob" ]
.
Look at what is printed and what is the class of each result.
Then compare scores[[ c( "Amy", "Bob" ) ]]
with
scores[ c( "Amy", "Bob" ) ]
.
Understand, why the error is reported.
scores[[ "Bob" ]] # Returns the value of Bob element (vector)
[1] 6 7 4 3 5 2 2 1 4
scores[ "Bob" ] # Creates a new list with only Bob there (list)
$Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores[[ "Bob" ]] )
[1] "numeric"
class( scores[ "Bob" ] )
[1] "list"
scores[[ c( "Amy", "Bob" ) ]] # A list is needed to return two elements
Error in scores[[c("Amy", "Bob")]]: subscript out of bounds
scores[ c( "Amy", "Bob" ) ] # This creates a list, so many elements are ok
$Amy
[1] 1 6 7 9 10
$Bob
[1] 6 7 4 3 5 2 2 1 4
- Dollar operator.
Reuse scores
from the previous exercises.
Investigate the (lack of) difference between scores$Bob
and
scores[[ "Bob" ]]
.
Look at what is printed and what is the class of each result.
Then compare scores$Bo
with
scores[[ "Bo" ]]
.
Understand, why the NULL
is returned.
scores$Bob # another way to access Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bob" ]] # get an element with exact name Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores$Bob )
[1] "numeric"
class( scores[[ "Bob" ]] )
[1] "numeric"
scores$Bo # strange matching of names, it still finds Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bo" ]] # there is no "Bo" so NULL is returned
NULL
Multitopic exercises
- (ADV) Mean grades for each student. (Call a function for each
element. Collect calls’ results into list.)
Consider the scores
list from the first exercise (also
copied below).
Calculate the mean grade for each student.
Use lapply
to apply the mean
function to each
element of scores
.
Also, replace lapply
with sapply
and compare
the results.
Try to explain what lapply
/sapply
do.
Note: the names of the list elements in scores
are
preserved in the result.
scores <- list(
Amy = c( 1,6,7,9,10 ),
Bob = c( 6,7,4,3,5,2,2,1,4 ),
Dan = c( 9,9,10 )
)
lapply( scores, mean ) # the result is a list
$Amy
[1] 6.6
$Bob
[1] 3.777778
$Dan
[1] 9.333333
sapply( scores, mean ) # the result is converted to a vector
Amy Bob Dan
6.600000 3.777778 9.333333
- (ADV) Simulate grades. (Define an own function and call it for
each element.)
Consider the scores
list from the previous exercise.
Let’s assume that the grades are not known yet and we need to simulate
them.
A vector nms
with several (e.g. 12, see below) unique names
of students is provided.
Each student should have a random number of grades (between 5 and
14).
The grades should be sampled from the range 1:10
.
Grades 1-4,9,10
are usually rare compared to
6-8
, so the probabilities of grades should not be uniform
(e.g. the ratios should be 1:1:1:1:2:10:20:20:2:1 for grades
1…10).
For each student, the grades should be sorted in ascending order.
The final list should have the same structure as scores
(i.e. the names of the list elements should be the names of the students
and the values should be the corresponding grades).
Hints:
- Use
sample
to generate a random number - how many
grades a student should have.
- Use
sample
with the prob
and
replace
arguments - grades with non-uniform
probabilities.
- Put above into a function
genGrades
that generates
grades for a single student.
- Use
lapply
to apply the function to each element of
nms
. Note, that the function does not use the
nm
argument (but it still needs to be present).
- Use
setNames
to assign names to the list elements (or
better name the elements of nms
before
lapply
).
nms <- c( "Amy", "Bob", "Carl", "Dany", "Ewa", "Frank", "Greg", "Holy", "Ian", "Jan", "Kees", "Leon" )
genGrades <- function(nm) { # nm is a single name, not used in the function
gradesNum <- sample( 5:14, 1 )
grades <- sort( sample( 1:10, size = gradesNum, prob = c(1,1,1,1,2,10,20,20,2,1), replace = TRUE ) )
return( grades )
}
lapply( setNames( nm = nms ), genGrades ) # calls genGrades for each element of nms
$Amy
[1] 5 5 5 7 7 7 8 8 8 8 8
$Bob
[1] 3 7 7 8 8
$Carl
[1] 3 4 6 6 6 6 7 7 7 7 8 8 8
$Dany
[1] 8 8 8 8 9
$Ewa
[1] 5 6 6 6 6 7 7 7 8 8 8 8 10
$Frank
[1] 6 6 8 8 8
$Greg
[1] 3 7 7 7 7 7 7 7 8 8 8 8 9
$Holy
[1] 3 7 7 7 8 8 9
$Ian
[1] 6 6 6 7 7 8 8 10
$Jan
[1] 1 6 7 7 8 8 8 8
$Kees
[1] 4 6 7 7 7 9
$Leon
[1] 7 7 7 7 7 7 8 8 8 9
# if elements of nms have names, the result has the same names
- (ADV) Plot scores given in a list. (Convert list to long tibble.
Plot it.)
Plotting functions usually require a table with data in a long
format.
Convert the scores
list from the first exercise to a long
table, with two columns name
and score
(each
grade should be a separate row).
Use ggplot
to plot the grades from the long table.
Hints:
- Write a function which converts a single element of
scores
to a tibble
with two columns
name
and score
.
- Use
lapply
to apply the function to each element of
scores
(you will get a list of tibbles).
- Use
bind_rows
to combine the results into a single
table (you will get a single, merged tibble).
- Use
ggplot
to plot the table. The example below uses
geom_dotplot
to plot the grades. You may use
geom_point
instead.
scores <- list(
Amy = c( 1,6,7,9,10 ),
Bob = c( 6,7,4,3,5,2,2,1,4 ),
Dan = c( 9,9,10 )
)
d <- names(scores) %>%
lapply( function( nm ) tibble( name=nm, score=scores[[nm]] ) ) %>%
bind_rows()
p <- ggplot( d ) +
aes( x=name, y=score ) +
geom_dotplot( binaxis="y", stackdir="center", binwidth=0.5 ) +
theme_bw() +
scale_y_continuous( limits=c(1,10), breaks=1:10 )
- (ADV) Split a table into list of tables by a column factor;
merge back.
Some functions might require an input to be provided as a list of
tables.
Let’s assume that the pulse
table should be split into a
list of table parts based on the exercise
argument.
Load the pulse.csv
data to variable
pulse
.
Try l <- pulse %>% split( .$exercise )
and
investigate the class
, length
and
names
of the result l
.
Use double square bracket to extract the part for exercise
being low
.
Finally, check that with bind_rows
applied to
l
you can recreate the pulse
table (but with a
different order of rows).
l <- pulse %>% split( .$exercise ) # . represents the object on the left side of %>%
class( l )
[1] "list"
length( l )
[1] 3
names( l )
[1] "high" "low" "moderate"
l[[ "low" ]]
# A tibble: 37 × 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993
2 1993_F George 184 74 22 male no yes low ran 78 141 1993
3 1993_L Frederick 178 58 19 male no no low sat 74 76 1993
4 1993_P Mathew 185 110 22 male no yes low sat 77 73 1993
5 1993_Q Leslie 170 56 19 male no no low sat 64 63 1993
6 1993_U Jerome 175 60 19 male no no low sat 88 86 1993
7 1993_V Arlene 140 50 34 female no no low ran 70 98 1993
8 1993_W Glenna 163 55 20 female no no low sat 78 74 1993
9 1995_B Olga 172 60 21 female no no low sat 81 79 1995
10 1995_H Eliza 164 66 23 female no no low ran 74 168 1995
# ℹ 27 more rows
recreatedPulse <- bind_rows( l )
dim( pulse )
[1] 110 13
dim( recreatedPulse )
[1] 110 13
- (ADV) Split a table by a column and write each part to a
different file.
Continue with the setup of the previous exercise.
Study/type/exectute the following example.
Find the newly created files in your filesystem.
l <- pulse %>% split( .$exercise )
exercises <- names( l ) # name in l of each table chunk
for( exercise in exercises ) { # exercise will be a name of a single chunk
fileName <- paste0( "pulse_", exercise, ".csv" ) # name of the file for the chunk
message( "Writing file '", fileName, "'..." )
write_csv( l[[ exercise ]], file = fileName )
}
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC