A tibble is a table – a two dimensional data structure with rows (observations) and columns (variables).
I’ll use the terms observations and rows
interchangeably depending on the context. The same goes for the terms
variables and columns. As you may recall the datasets
pulse
and survey
were in fact of type tibble.
Each variable in a tibble has a fixed type such as
character
, numeric
etc. Let’s start by
creating a tibble manually.
To create a tibble you need to make sure that the package tidyverse is installed and loaded. See installation for more details.
Enter the following to load tidyverse package:
library(tidyverse)
Creating a tibble is done using the keyword
tibble taking a sequence of name=value
pairs
where:
name
is the variable (column) name,value
is the values of the observation.Take for example the variables name
, year
and colour
to represent a person’s name, birth year and
favourite colour:
favourite_colour <- tibble(name=c("Lucas","Lotte","Noa","Wim","Marc","Lucy","Pedro"),
year=c(1995,1995,1995,1994,1990,1993,1992),
colour=c("Blue","Green","Yellow","Purple","Green","red","Blue"))
When creating a tibble the column vectors must be of the
same length.
The variable favourite_colour
now holds the data. Enter
its name in the R Console
for inspection:
favourite_colour
# A tibble: 7 × 3
name year colour
<chr> <dbl> <chr>
1 Lucas 1995 Blue
2 Lotte 1995 Green
3 Noa 1995 Yellow
4 Wim 1994 Purple
5 Marc 1990 Green
6 Lucy 1993 red
7 Pedro 1992 Blue
AnswerWhat additional pieces of information do you see beside the content we provided?
‘# A tibble: 4 x 3’, which says that this is a tibble with dimensions 4x3 (4 observations and 3 variables),
the atomic type of each variable, in this case character and double (numeric),
the row numbers
Type the following to find out the dimensions of the tibble:
ncol(favourite_colour) # number of variables (columns)
[1] 3
nrow(favourite_colour) # number of observations (rows)
[1] 7
dim(favourite_colour) # dimensions : 7 rows and 3 columns
[1] 7 3
Show top and bottom rows of the tibble:
head(favourite_colour, 2) # first 2 observations (rows)
# A tibble: 2 × 3
name year colour
<chr> <dbl> <chr>
1 Lucas 1995 Blue
2 Lotte 1995 Green
tail(favourite_colour, 3) # last 3 observations (rows)
# A tibble: 3 × 3
name year colour
<chr> <dbl> <chr>
1 Marc 1990 Green
2 Lucy 1993 red
3 Pedro 1992 Blue
With the second argument to head
and tail
functions you can control the number of rows.
By default head and tail show 6 rows, i.e. when the
second argument is omitted : head(favourite_colour) or
tail(favourite_colour).
glimpse
With glimpse
function you can quickly inspect the top
part of all variables in a tibble with some additional meta information
such as number of rows and variables and their types:
favourite_colour %>% glimpse()
Rows: 7
Columns: 3
$ name <chr> "Lucas", "Lotte", "Noa", "Wim", "Marc", "Lucy", "Pedro"
$ year <dbl> 1995, 1995, 1995, 1994, 1990, 1993, 1992
$ colour <chr> "Blue", "Green", "Yellow", "Purple", "Green", "red", "Blue"
Often you may need to select certain variables, this can be done
using square brackets [
:
favourite_colour["colour"]
# A tibble: 7 × 1
colour
<chr>
1 Blue
2 Green
3 Yellow
4 Purple
5 Green
6 red
7 Blue
or combination of variables:
favourite_colour[c("name","year")]
# A tibble: 7 × 2
name year
<chr> <dbl>
1 Lucas 1995
2 Lotte 1995
3 Noa 1995
4 Wim 1994
5 Marc 1990
6 Lucy 1993
7 Pedro 1992
Subset result of a tibble is always a
tibble.
Selection of variables can also be achieved with indices as we saw in vectors:
favourite_colour[2:3]
# A tibble: 7 × 2
year colour
<dbl> <chr>
1 1995 Blue
2 1995 Green
3 1995 Yellow
4 1994 Purple
5 1990 Green
6 1993 red
7 1992 Blue
favourite_colour[c(1,3)]
# A tibble: 7 × 2
name colour
<chr> <chr>
1 Lucas Blue
2 Lotte Green
3 Noa Yellow
4 Wim Purple
5 Marc Green
6 Lucy red
7 Pedro Blue
To deselect use negative indices:
favourite_colour[-2]
# A tibble: 7 × 2
name colour
<chr> <chr>
1 Lucas Blue
2 Lotte Green
3 Noa Yellow
4 Wim Purple
5 Marc Green
6 Lucy red
7 Pedro Blue
If you want to work with variables as individual vectors then you can do this either by double square brackets or $ sign:
favourite_colour[["year"]]
[1] 1995 1995 1995 1994 1990 1993 1992
favourite_colour$year
[1] 1995 1995 1995 1994 1990 1993 1992
In some contexts (later in the course) it is convenient to use the
function pull
which does the same as [[
and
$
:
pull(favourite_colour, year)
[1] 1995 1995 1995 1994 1990 1993 1992
Tibbles can be written to data files and read back again. Many data
formats exist but for brevity we will be using comma-separated-value
(csv) format in this course. The functions involved for this purpose are
write_csv
and read_csv
(see data
import cheat sheet).
let us now save our first tibble into a file in csv format:
write_csv(x = favourite_colour, file = "favourite_colour.csv")
favourite_colour
tibble is written to
favourite_colour.csv
text file. You may inspect the file
with any editor and it should look something like:
name,year,colour
Lucas,1995,Blue
Lotte,1995,Green
Noa,1995,Yellow
Wim,1994,Purple
This way we can permanently store our results in files for later use.
We can now read the csv file back into a R environment variable,
e.g. favourite_colour_csv
:
favourite_colour_csv <- read_csv(file = "favourite_colour.csv")
Rows: 7 Columns: 3
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): name, colour
dbl (1): year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
favourite_colour_csv
# A tibble: 7 × 3
name year colour
<chr> <dbl> <chr>
1 Lucas 1995 Blue
2 Lotte 1995 Green
3 Noa 1995 Yellow
4 Wim 1994 Purple
5 Marc 1990 Green
6 Lucy 1993 red
7 Pedro 1992 Blue
The message from read_csv
gives a summary of the
variables and their inferred types, it also suggest to use argument
show_col_types = FALSE
if you’d prefere asilent read:
favourite_colour_csv <- read_csv(file = "favourite_colour.csv", show_col_types = FALSE)
favourite_colour_csv
# A tibble: 7 × 3
name year colour
<chr> <dbl> <chr>
1 Lucas 1995 Blue
2 Lotte 1995 Green
3 Noa 1995 Yellow
4 Wim 1994 Purple
5 Marc 1990 Green
6 Lucy 1993 red
7 Pedro 1992 Blue
Now inspect the type of the tibble we just created:
class(favourite_colour)
[1] "tbl_df" "tbl" "data.frame"
A tibble is in its core a ‘data.frame’, a base R data structure.
‘The types tbl_df’ and ‘tbl’ enforce additional
convinient behaviours specific to tibble.
Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC