Primary exercises

  1. Create tibble
  1. Create a tibble exercise_group for a group of individuals with names {Sonja, Steven, Ines, Robert, Tim} with their heights {164, 188, 164, 180, 170}, weights {56.0, 87.0, 54.0, 80.0, 58.5} and frequency of exercise {high, high, low, moderate, low}.
exercise_group <- tibble(name=c("Sonja" , "Steven", "Ines", "Robert", "Tim" ),
            height=c(164, 188, 164, 180, 170),  
            weight=c(56.0, 87.0, 54.0, 80.0, 58.5),
            exercise=c("high", "high", "low", "moderate", "low")
     )  
exercise_group
# A tibble: 5 × 4
  name   height weight exercise
  <chr>   <dbl>  <dbl> <chr>   
1 Sonja     164   56   high    
2 Steven    188   87   high    
3 Ines      164   54   low     
4 Robert    180   80   moderate
5 Tim       170   58.5 low     
  1. Update the tibble exercise_group with Ella and Oscar, leave their respective height, weight and exercise values as missing (NA). Avoid copy/paste from (a) with inclusion of new names, instead try to reuse the columns inside exercise_group.
exercise_group <- tibble(name=c(exercise_group$name, "Ella", "Oscar"),
                         height=c(exercise_group$height,NA,NA),
                         weight=c(exercise_group$weight,NA,NA),
                         exercise=c(exercise_group$exercise,NA,NA)
                  )
exercise_group       
# A tibble: 7 × 4
  name   height weight exercise
  <chr>   <dbl>  <dbl> <chr>   
1 Sonja     164   56   high    
2 Steven    188   87   high    
3 Ines      164   54   low     
4 Robert    180   80   moderate
5 Tim       170   58.5 low     
6 Ella       NA   NA   <NA>    
7 Oscar      NA   NA   <NA>    
  1. Add the sex variable to exercise_group with values male and female.
exercise_group <- tibble(name=exercise_group$name,
                         height=exercise_group$height,
                         weight=exercise_group$weight,
                         exercise=exercise_group$exercise,
                         sex=c('female','male','female','male','male','female','male')
                  )
exercise_group
# A tibble: 7 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Sonja     164   56   high     female
2 Steven    188   87   high     male  
3 Ines      164   54   low      female
4 Robert    180   80   moderate male  
5 Tim       170   58.5 low      male  
6 Ella       NA   NA   <NA>     female
7 Oscar      NA   NA   <NA>     male  
  1. Create a tibble which keeps track of the smoking habits over the years of Julio age 21 started smoking at 17 and stopped in 2020, Camille age 20 started smoking in 2021 and Travis 19 started at age 16.
# List the information first as below, here NA (missing value) is interpreted as not 
# stopped, i.e. still smoking to present date.
# 
# name     age  start          stop
# Julio    21   2022-(21-17)   2020
# Camille  20   2021           NA
# Travis   19   2022-(19-16)   NA

tibble(name=c("Julio", "Camille","Travis"), 
       age=c(21,20,19), 
       start=c(2018,2021,2019), 
       stop=c(2020,NA,NA))
# A tibble: 3 × 4
  name      age start  stop
  <chr>   <dbl> <dbl> <dbl>
1 Julio      21  2018  2020
2 Camille    20  2021    NA
3 Travis     19  2019    NA

tibble subset

  1. Take the tibble exercise_group from the previous exercise and create a new tibble exercise_group_sub without the height and weight variables by selection [.
exercise_group_sub <- exercise_group[c("name","exercise")]
exercise_group_sub
# A tibble: 7 × 2
  name   exercise
  <chr>  <chr>   
1 Sonja  high    
2 Steven high    
3 Ines   low     
4 Robert moderate
5 Tim    low     
6 Ella   <NA>    
7 Oscar  <NA>    
  1. Create a tibble called exercise_group_sub with the 1st and 3rd column.
exercise_group_sub <- exercise_group[c(1,3)]
exercise_group_sub
# A tibble: 7 × 2
  name   weight
  <chr>   <dbl>
1 Sonja    56  
2 Steven   87  
3 Ines     54  
4 Robert   80  
5 Tim      58.5
6 Ella     NA  
7 Oscar    NA  

Extract variables as vectors

  1. Given the tibble favourite_colour, how old were the subjects by the end of 2021?
2021 - favourite_colour[["year"]] # alternatively:  2021 - favourite_color$year
[1] 26 26 26 27 31 28 29
  1. What is the mean height in exercise_group? Use mean function (see ?mean).
mean(exercise_group[["height"]]) 
[1] NA

Read tibbles from file

  1. Read pulse.csv data set into R and inspect its dimensions.
pulse <- read_csv(file = "pulse.csv")
# two alternatives i) nrow and ncol function, ii) dim function.
nrow(pulse) # number of rows
[1] 110
ncol(pulse) # number of columns
[1] 13
dim(pulse)  # dimensions (rows, columns)
[1] 110  13
  1. Read survey.csv data set into R.
survey <- read_csv(file = "survey.csv")
  • Inspect the dimensions.
dim(survey)
[1] 233  13
  • Show the first 9 and the last 7 rows.
head(survey,9)
# A tibble: 9 × 13
  name    gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i        age
  <chr>   <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
1 Alyson  female  18.5  18   right right      92 left    some     never    173  metric    18.2
2 Todd    male    19.5  20.5 left  right     104 left    none     regul    178. imperial  17.6
3 Gerald  male    18    13.3 right left       87 neither none     occas     NA  <NA>      16.9
4 Robert  male    18.8  18.9 right right      NA neither none     never    160  metric    20.3
5 Dustin  male    20    20   right neither    35 right   some     never    165  metric    23.7
6 Abby    female  18    17.7 right left       64 right   some     never    173. imperial  21  
7 Andre   male    17.7  17.7 right left       83 right   freq     never    183. imperial  18.8
8 Michael female  17    17.3 right right      74 right   freq     never    157  metric    35.8
9 Edward  male    20    19.5 right right      72 right   some     never    175  metric    19  
tail(survey,7)
# A tibble: 7 × 13
  name     gender span1 span2 hand  fold  pulse clap  exercise smokes height m.i        age
  <chr>    <chr>  <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr>    <chr>   <dbl> <chr>    <dbl>
1 Marcella female  18.8  18.5 right right    80 right some     never    169  metric    18.2
2 Jerry    male    18    16   right right    NA right some     never    180. imperial  20.8
3 Jeanne   female  18    18   right left     85 right some     never    165. imperial  17.7
4 Rosanna  female  18.5  18   right left     88 right some     never    160  metric    16.9
5 Tracey   female  17.5  16.5 right right    NA right some     never    170  metric    18.6
6 Keith    male    21    21.5 right right    90 right some     never    183  metric    17.2
7 Celina   female  17.6  17.3 right right    85 right freq     never    168. metric    17.8
  • Calculate the mean age.
mean(survey$age) 
[1] 20.35591
  • Calculate the mean height in survey data.
# Here we use a second argument 'na.rm = TRUE' because there are missing values (NA) in 
# the variable height. By default the mean function returns NA if it first argument, in this 
# case variable 'height', contains any NA. The second argument 'na.rm = TRUE' changes this 
# behaviour by disregarding the observations with missing height and calculates the mean 
# of observations for which the height is available. 
# 
mean(survey$height, na.rm = TRUE) 
[1] 172.3459

Extra exercises

  1. In survey data:
  1. What is the mean height of the last 30 observations?
survey_last_30 <- tail(survey,30)
mean(survey_last_30$height, na.rm = TRUE) # <=> mean(survey_last_30[["height"]], na.rm = TRUE)
[1] 170.1431
  1. The variable age is the last column in the survey data. Make a tibble where the variable age comes directly after name.
# Some solutions
#
# 1) list names : exhaustive 
survey[c("name", "age", "gender", "span1" ,"span2","hand","fold", 
   "pulse", "clap", "exercise", "smokes", "height", "m.i")] 
# A tibble: 233 × 13
   name      age gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i     
   <chr>   <dbl> <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>   
 1 Alyson   18.2 female  18.5  18   right right      92 left    some     never    173  metric  
 2 Todd     17.6 male    19.5  20.5 left  right     104 left    none     regul    178. imperial
 3 Gerald   16.9 male    18    13.3 right left       87 neither none     occas     NA  <NA>    
 4 Robert   20.3 male    18.8  18.9 right right      NA neither none     never    160  metric  
 5 Dustin   23.7 male    20    20   right neither    35 right   some     never    165  metric  
 6 Abby     21   female  18    17.7 right left       64 right   some     never    173. imperial
 7 Andre    18.8 male    17.7  17.7 right left       83 right   freq     never    183. imperial
 8 Michael  35.8 female  17    17.3 right right      74 right   freq     never    157  metric  
 9 Edward   19   male    20    19.5 right right      72 right   some     never    175  metric  
10 Carl     22.3 male    18.5  18.5 right right      90 right   some     never    167  metric  
# … with 223 more rows
# 2) indices  
survey[c(1,ncol(survey),2:(ncol(survey)-1))]
# A tibble: 233 × 13
   name      age gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i     
   <chr>   <dbl> <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>   
 1 Alyson   18.2 female  18.5  18   right right      92 left    some     never    173  metric  
 2 Todd     17.6 male    19.5  20.5 left  right     104 left    none     regul    178. imperial
 3 Gerald   16.9 male    18    13.3 right left       87 neither none     occas     NA  <NA>    
 4 Robert   20.3 male    18.8  18.9 right right      NA neither none     never    160  metric  
 5 Dustin   23.7 male    20    20   right neither    35 right   some     never    165  metric  
 6 Abby     21   female  18    17.7 right left       64 right   some     never    173. imperial
 7 Andre    18.8 male    17.7  17.7 right left       83 right   freq     never    183. imperial
 8 Michael  35.8 female  17    17.3 right right      74 right   freq     never    157  metric  
 9 Edward   19   male    20    19.5 right right      72 right   some     never    175  metric  
10 Carl     22.3 male    18.5  18.5 right right      90 right   some     never    167  metric  
# … with 223 more rows
# 3) The select(...) function is a more concise solution which will be discussed 
#    in the next section. 
  1. Create the favourite_colour tibble from the lecture but now with colour variable as a factor. Print the counts for each level.
favourite_colour  <- tibble(name=c("Lucas","Lotte","Noa","Wim","Marc","Lucy","Pedro"), 
                           year=c(1995,1995,1995,1994,1990,1993,1992), 
                           colour=factor(c("Blue","Green","Yellow","Purple","Green","red","Blue")))
fct_count(favourite_colour[['colour']])
# A tibble: 5 × 2
  f          n
  <fct>  <int>
1 Blue       2
2 Green      2
3 Purple     1
4 red        1
5 Yellow     1

[<row>, <column>] : row and column selection based on range of indices.

Using single square bracket [ one could select range of rows, columns or a combination. For example take the exercise_group tibble from the primary exercises above, then:

exercise_group[c(2,3),]  # returns rows in the range 2 to 3
# A tibble: 2 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Steven    188     87 high     male  
2 Ines      164     54 low      female
exercise_group[,c(1,3)]  # returns columns in 1 and 3
# A tibble: 7 × 2
  name   weight
  <chr>   <dbl>
1 Sonja    56  
2 Steven   87  
3 Ines     54  
4 Robert   80  
5 Tim      58.5
6 Ella     NA  
7 Oscar    NA  
exercise_group[c(2,3),c(1,3)]  # combination the above
# A tibble: 2 × 2
  name   weight
  <chr>   <dbl>
1 Steven     87
2 Ines       54
  1. Reproduce the following tibbles from exercise_group:
exercise_group[c(1,4),]
# A tibble: 2 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Sonja     164     56 high     female
2 Robert    180     80 moderate male  
exercise_group[2:5, c(1,3,4)]
# A tibble: 4 × 3
  name   weight exercise
  <chr>   <dbl> <chr>   
1 Steven   87   high    
2 Ines     54   low     
3 Robert   80   moderate
4 Tim      58.5 low     
exercise_group[6:7,2:4]
# A tibble: 2 × 3
  height weight exercise
   <dbl>  <dbl> <chr>   
1     NA     NA <NA>    
2     NA     NA <NA>    


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC