Primary exercises
Create and investigate a list.
Three students received different sets of grades (Amy: 1,6,7,9,10; Bob:
6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
In a variable scores
create a list
(the names
of the list elements should be the names of the students and the values
should be the corresponding grades).
Print the list, its class
, length
and
structure (str
) of scores
.
Add an element, change an element.
Reuse scores
from the previous exercise.
Add there grades for Eve (7,3,5,8,8,9) and print the list.
Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint:
use the combine function c
to combine existing Dan’s grades
with the new grades then put the result back to scores
; do
not type again 9,9,10
).
Single and double bracket operators.
Reuse scores
from the previous exercises.
Investigate the difference between scores[[ "Bob" ]]
and
scores[ "Bob" ]
.
Look at what is printed and what is the class of each result.
Then compare scores[[ c( "Amy", "Bob" ) ]]
with
scores[ c( "Amy", "Bob" ) ]
.
Understand, why the error is reported.
Dollar operator.
Reuse scores
from the previous exercises.
Investigate the (lack of) difference between scores$Bob
and
scores[[ "Bob" ]]
.
Look at what is printed and what is the class of each result.
Then compare scores$Bo
with
scores[[ "Bo" ]]
.
Understand, why the NULL
is returned.
Multitopic exercises
- (ADV) Mean grades for each student. (Call a function for each
element. Collect calls’ results into list.)
Consider the scores
list from the first exercise (also
copied below).
Calculate the mean grade for each student.
Use lapply
to apply the mean
function to each
element of scores
.
Also, replace lapply
with sapply
and compare
the results.
Try to explain what lapply
/sapply
do.
Note: the names of the list elements in scores
are
preserved in the result.
scores <- list(
Amy = c( 1,6,7,9,10 ),
Bob = c( 6,7,4,3,5,2,2,1,4 ),
Dan = c( 9,9,10 )
)
- (ADV) Simulate grades. (Define an own function and call it for
each element.)
Consider the scores
list from the previous exercise.
Let’s assume that the grades are not known yet and we need to simulate
them.
A vector nms
with several (e.g. 12, see below) unique names
of students is provided.
Each student should have a random number of grades (between 5 and
14).
The grades should be sampled from the range 1:10
.
Grades 1-4,9,10
are usually rare compared to
6-8
, so the probabilities of grades should not be uniform
(e.g. the ratios should be 1:1:1:1:2:10:20:20:2:1 for grades
1…10).
For each student, the grades should be sorted in ascending order.
The final list should have the same structure as scores
(i.e. the names of the list elements should be the names of the students
and the values should be the corresponding grades).
Hints:
- Use
sample
to generate a random number - how many
grades a student should have.
- Use
sample
with the prob
and
replace
arguments - grades with non-uniform
probabilities.
- Put above into a function
genGrades
that generates
grades for a single student.
- Use
lapply
to apply the function to each element of
nms
. Note, that the function does not use the
nm
argument (but it still needs to be present).
- Use
setNames
to assign names to the list elements (or
better name the elements of nms
before
lapply
).
nms <- c( "Amy", "Bob", "Carl", "Dany", "Ewa", "Frank", "Greg", "Holy", "Ian", "Jan", "Kees", "Leon" )
- (ADV) Plot scores given in a list. (Convert list to long tibble.
Plot it.)
Plotting functions usually require a table with data in a long
format.
Convert the scores
list from the first exercise to a long
table, with two columns name
and score
(each
grade should be a separate row).
Use ggplot
to plot the grades from the long table.
Hints:
- Write a function which converts a single element of
scores
to a tibble
with two columns
name
and score
.
- Use
lapply
to apply the function to each element of
scores
(you will get a list of tibbles).
- Use
bind_rows
to combine the results into a single
table (you will get a single, merged tibble).
- Use
ggplot
to plot the table. The example below uses
geom_dotplot
to plot the grades. You may use
geom_point
instead.
(ADV) Split a table into list of tables by a column factor;
merge back.
Some functions might require an input to be provided as a list of
tables.
Let’s assume that the pulse
table should be split into a
list of table parts based on the exercise
argument.
Load the pulse.csv
data to variable
pulse
.
Try l <- pulse %>% split( .$exercise )
and
investigate the class
, length
and
names
of the result l
.
Use double square bracket to extract the part for exercise
being low
.
Finally, check that with bind_rows
applied to
l
you can recreate the pulse
table (but with a
different order of rows).
(ADV) Split a table by a column and write each part to a
different file.
Continue with the setup of the previous exercise.
Study/type/exectute the following example.
Find the newly created files in your filesystem.
l <- pulse %>% split( .$exercise )
exercises <- names( l ) # name in l of each table chunk
for( exercise in exercises ) { # exercise will be a name of a single chunk
fileName <- paste0( "pulse_", exercise, ".csv" ) # name of the file for the chunk
message( "Writing file '", fileName, "'..." )
write_csv( l[[ exercise ]], file = fileName )
}
Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC