Tibble (solutions)

Primary exercises

Create tibble

Create a tibble exercise_group for a group of individuals with names {Sonja, Steven, Ines, Robert, Tim} with their heights {164, 188, 164, 180, 170}, weights {56.0, 87.0, 54.0, 80.0, 58.5} and frequency of exercise {high, high, low, moderate, low}.

exercise_group <- tibble(name=c("Sonja" , "Steven", "Ines", "Robert", "Tim" ),
            height=c(164, 188, 164, 180, 170),  
            weight=c(56.0, 87.0, 54.0, 80.0, 58.5),
            exercise=c("high", "high", "low", "moderate", "low")
     )  
exercise_group

# A tibble: 5 × 4
  name   height weight exercise
  <chr>   <dbl>  <dbl> <chr>   
1 Sonja     164   56   high    
2 Steven    188   87   high    
3 Ines      164   54   low     
4 Robert    180   80   moderate
5 Tim       170   58.5 low

Update the tibble exercise_group with Ella and Oscar, leave their respective height, weight and exercise values as missing (NA). Avoid copy/paste from (a) with inclusion of new names, instead try to reuse the columns inside exercise_group.

exercise_group <- tibble(name=c(exercise_group$name, "Ella", "Oscar"),
                         height=c(exercise_group$height,NA,NA),
                         weight=c(exercise_group$weight,NA,NA),
                         exercise=c(exercise_group$exercise,NA,NA)
                  )
exercise_group

# A tibble: 7 × 4
  name   height weight exercise
  <chr>   <dbl>  <dbl> <chr>   
1 Sonja     164   56   high    
2 Steven    188   87   high    
3 Ines      164   54   low     
4 Robert    180   80   moderate
5 Tim       170   58.5 low     
6 Ella       NA   NA   <NA>    
7 Oscar      NA   NA   <NA>

Add the sex variable to exercise_group with values male and female.

exercise_group <- tibble(name=exercise_group$name,
                         height=exercise_group$height,
                         weight=exercise_group$weight,
                         exercise=exercise_group$exercise,
                         sex=c('female','male','female','male','male','female','male')
                  )
exercise_group

# A tibble: 7 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Sonja     164   56   high     female
2 Steven    188   87   high     male  
3 Ines      164   54   low      female
4 Robert    180   80   moderate male  
5 Tim       170   58.5 low      male  
6 Ella       NA   NA   <NA>     female
7 Oscar      NA   NA   <NA>     male

Create a tibble which keeps track of the smoking habits over the years of Julio age 21 started smoking at 17 and stopped in 2020, Camille age 20 started smoking in 2021 and Travis 19 started at age 16.

# List the information first as below, here NA (missing value) is interpreted as not 
# stopped, i.e. still smoking to present date.
# 
# name     age  start          stop
# Julio    21   2022-(21-17)   2020
# Camille  20   2021           NA
# Travis   19   2022-(19-16)   NA

tibble(name=c("Julio", "Camille","Travis"), 
       age=c(21,20,19), 
       start=c(2018,2021,2019), 
       stop=c(2020,NA,NA))

# A tibble: 3 × 4
  name      age start  stop
  <chr>   <dbl> <dbl> <dbl>
1 Julio      21  2018  2020
2 Camille    20  2021    NA
3 Travis     19  2019    NA

tibble subset

Take the tibble exercise_group from the previous exercise and create a new tibble exercise_group_sub without the height and weight variables by selection [.

exercise_group_sub <- exercise_group[c("name","exercise")]
exercise_group_sub

# A tibble: 7 × 2
  name   exercise
  <chr>  <chr>   
1 Sonja  high    
2 Steven high    
3 Ines   low     
4 Robert moderate
5 Tim    low     
6 Ella   <NA>    
7 Oscar  <NA>

Create a tibble called exercise_group_sub with the 1st and 3rd column.

exercise_group_sub <- exercise_group[c(1,3)]
exercise_group_sub

# A tibble: 7 × 2
  name   weight
  <chr>   <dbl>
1 Sonja    56  
2 Steven   87  
3 Ines     54  
4 Robert   80  
5 Tim      58.5
6 Ella     NA  
7 Oscar    NA

Extract variables as vectors

Given the tibble favourite_colour, how old were the subjects by the end of 2021?

2021 - favourite_colour[["year"]] # alternatively:  2021 - favourite_color$year

[1] 26 26 26 27 31 28 29

What is the mean height in exercise_group? Use mean function (see ?mean).

mean(exercise_group[["height"]])

[1] NA

Read tibbles from file

Read pulse.csv data set into R and inspect its dimensions.

pulse <- read_csv(file = "pulse.csv")

# two alternatives i) nrow and ncol function, ii) dim function.
nrow(pulse) # number of rows

[1] 110

ncol(pulse) # number of columns

[1] 13

dim(pulse)  # dimensions (rows, columns)

[1] 110  13

Read survey.csv data set into R.

survey <- read_csv(file = "survey.csv")

Inspect the dimensions.

dim(survey)

[1] 233  13

Show the first 9 and the last 7 rows.

head(survey,9)

# A tibble: 9 × 13
  name    gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i        age
  <chr>   <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
1 Alyson  female  18.5  18   right right      92 left    some     never    173  metric    18.2
2 Todd    male    19.5  20.5 left  right     104 left    none     regul    178. imperial  17.6
3 Gerald  male    18    13.3 right left       87 neither none     occas     NA  <NA>      16.9
4 Robert  male    18.8  18.9 right right      NA neither none     never    160  metric    20.3
5 Dustin  male    20    20   right neither    35 right   some     never    165  metric    23.7
6 Abby    female  18    17.7 right left       64 right   some     never    173. imperial  21  
7 Andre   male    17.7  17.7 right left       83 right   freq     never    183. imperial  18.8
8 Michael female  17    17.3 right right      74 right   freq     never    157  metric    35.8
9 Edward  male    20    19.5 right right      72 right   some     never    175  metric    19

tail(survey,7)

# A tibble: 7 × 13
  name     gender span1 span2 hand  fold  pulse clap  exercise smokes height m.i        age
  <chr>    <chr>  <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr>    <chr>   <dbl> <chr>    <dbl>
1 Marcella female  18.8  18.5 right right    80 right some     never    169  metric    18.2
2 Jerry    male    18    16   right right    NA right some     never    180. imperial  20.8
3 Jeanne   female  18    18   right left     85 right some     never    165. imperial  17.7
4 Rosanna  female  18.5  18   right left     88 right some     never    160  metric    16.9
5 Tracey   female  17.5  16.5 right right    NA right some     never    170  metric    18.6
6 Keith    male    21    21.5 right right    90 right some     never    183  metric    17.2
7 Celina   female  17.6  17.3 right right    85 right freq     never    168. metric    17.8

Calculate the mean age.

mean(survey$age)

[1] 20.35591

Calculate the mean height in survey data.

# Here we use a second argument 'na.rm = TRUE' because there are missing values (NA) in 
# the variable height. By default the mean function returns NA if it first argument, in this 
# case variable 'height', contains any NA. The second argument 'na.rm = TRUE' changes this 
# behaviour by disregarding the observations with missing height and calculates the mean 
# of observations for which the height is available. 
# 
mean(survey$height, na.rm = TRUE)

[1] 172.3459

Extra exercises

In survey data:

What is the mean height of the last 30 observations?

survey_last_30 <- tail(survey,30)
mean(survey_last_30$height, na.rm = TRUE) # <=> mean(survey_last_30[["height"]], na.rm = TRUE)

[1] 170.1431

The variable age is the last column in the survey data. Make a tibble where the variable age comes directly after name.

# Some solutions
#
# 1) list names : exhaustive 
survey[c("name", "age", "gender", "span1" ,"span2","hand","fold", 
   "pulse", "clap", "exercise", "smokes", "height", "m.i")]

# A tibble: 233 × 13
   name      age gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i     
   <chr>   <dbl> <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>   
 1 Alyson   18.2 female  18.5  18   right right      92 left    some     never    173  metric  
 2 Todd     17.6 male    19.5  20.5 left  right     104 left    none     regul    178. imperial
 3 Gerald   16.9 male    18    13.3 right left       87 neither none     occas     NA  <NA>    
 4 Robert   20.3 male    18.8  18.9 right right      NA neither none     never    160  metric  
 5 Dustin   23.7 male    20    20   right neither    35 right   some     never    165  metric  
 6 Abby     21   female  18    17.7 right left       64 right   some     never    173. imperial
 7 Andre    18.8 male    17.7  17.7 right left       83 right   freq     never    183. imperial
 8 Michael  35.8 female  17    17.3 right right      74 right   freq     never    157  metric  
 9 Edward   19   male    20    19.5 right right      72 right   some     never    175  metric  
10 Carl     22.3 male    18.5  18.5 right right      90 right   some     never    167  metric  
# ℹ 223 more rows

# 2) indices  
survey[c(1,ncol(survey),2:(ncol(survey)-1))]

# A tibble: 233 × 13
   name      age gender span1 span2 hand  fold    pulse clap    exercise smokes height m.i     
   <chr>   <dbl> <chr>  <dbl> <dbl> <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>   
 1 Alyson   18.2 female  18.5  18   right right      92 left    some     never    173  metric  
 2 Todd     17.6 male    19.5  20.5 left  right     104 left    none     regul    178. imperial
 3 Gerald   16.9 male    18    13.3 right left       87 neither none     occas     NA  <NA>    
 4 Robert   20.3 male    18.8  18.9 right right      NA neither none     never    160  metric  
 5 Dustin   23.7 male    20    20   right neither    35 right   some     never    165  metric  
 6 Abby     21   female  18    17.7 right left       64 right   some     never    173. imperial
 7 Andre    18.8 male    17.7  17.7 right left       83 right   freq     never    183. imperial
 8 Michael  35.8 female  17    17.3 right right      74 right   freq     never    157  metric  
 9 Edward   19   male    20    19.5 right right      72 right   some     never    175  metric  
10 Carl     22.3 male    18.5  18.5 right right      90 right   some     never    167  metric  
# ℹ 223 more rows

# 3) The select(...) function is a more concise solution which will be discussed 
#    in the next section.

Create the favourite_colour tibble from the lecture but now with colour variable as a factor. Print the counts for each level.

favourite_colour  <- tibble(name=c("Lucas","Lotte","Noa","Wim","Marc","Lucy","Pedro"), 
                           year=c(1995,1995,1995,1994,1990,1993,1992), 
                           colour=factor(c("Blue","Green","Yellow","Purple","Green","red","Blue")))
fct_count(favourite_colour[['colour']])

# A tibble: 5 × 2
  f          n
  <fct>  <int>
1 Blue       2
2 Green      2
3 Purple     1
4 red        1
5 Yellow     1

`[<row>, <column>]` : row and column selection based on range of indices.

Using single square bracket [ one could select range of rows, columns or a combination. For example take the exercise_group tibble from the primary exercises above, then:

exercise_group[c(2,3),]  # returns rows in the range 2 to 3

# A tibble: 2 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Steven    188     87 high     male  
2 Ines      164     54 low      female

exercise_group[,c(1,3)]  # returns columns in 1 and 3

# A tibble: 7 × 2
  name   weight
  <chr>   <dbl>
1 Sonja    56  
2 Steven   87  
3 Ines     54  
4 Robert   80  
5 Tim      58.5
6 Ella     NA  
7 Oscar    NA

exercise_group[c(2,3),c(1,3)]  # combination the above

# A tibble: 2 × 2
  name   weight
  <chr>   <dbl>
1 Steven     87
2 Ines       54

Reproduce the following tibbles from exercise_group:

exercise_group[c(1,4),]

# A tibble: 2 × 5
  name   height weight exercise sex   
  <chr>   <dbl>  <dbl> <chr>    <chr> 
1 Sonja     164     56 high     female
2 Robert    180     80 moderate male

exercise_group[2:5, c(1,3,4)]

# A tibble: 4 × 3
  name   weight exercise
  <chr>   <dbl> <chr>   
1 Steven   87   high    
2 Ines     54   low     
3 Robert   80   moderate
4 Tim      58.5 low

exercise_group[6:7,2:4]

# A tibble: 2 × 3
  height weight exercise
   <dbl>  <dbl> <chr>   
1     NA     NA <NA>    
2     NA     NA <NA>

↑ Lecture ⇄ Practice

Tibble (solutions)

Primary exercises

tibble subset

Extract variables as vectors

Read tibbles from file

Extra exercises

[<row>, <column>] : row and column selection based on range of indices.

`[<row>, <column>]` : row and column selection based on range of indices.