Change layout of values in a table
Source: data import cheat sheet : tidyr
Two main functions to manipulate the layout of the table,
pivot_longer
transforms the table from wide to long format
and pivot_wider
, which does the opposite, converts the
table from long to wide format.
pivot_longer
: wide to long formatThis function allows collapsing ‘similar’ variables into one variable
while guaranteeing the data set’s consistency. For example take the
variables pulse1
and pulse2
in the following
subset of the pulse data set:
pulses <- pulse %>%
select(name,pulse1,pulse2) %>%
head(3)
pulses
# A tibble: 3 × 3
name pulse1 pulse2
<chr> <dbl> <dbl>
1 Bonnie 86 88
2 Melanie 82 150
3 Consuelo 96 176
We can transform the table as such that all pulse values are under a
single variable, say pulse
:
dfLong <- pulses %>% pivot_longer(c(pulse1, pulse2), names_to = "pulse", values_to = "level")
dfLong
# A tibble: 6 × 3
name pulse level
<chr> <chr> <dbl>
1 Bonnie pulse1 86
2 Bonnie pulse2 88
3 Melanie pulse1 82
4 Melanie pulse2 150
5 Consuelo pulse1 96
6 Consuelo pulse2 176
This is called the long version of the original (wide) table and contains the same information.
Alternatively you achieve the same result using !
(negation operator):
dfLong <- pulses %>% pivot_longer(!name, names_to = "pulse", values_to = "level")
dfLong
# A tibble: 6 × 3
name pulse level
<chr> <chr> <dbl>
1 Bonnie pulse1 86
2 Bonnie pulse2 88
3 Melanie pulse1 82
4 Melanie pulse2 150
5 Consuelo pulse1 96
6 Consuelo pulse2 176
AnswerCan you find ohter variables in the pulse data set which can be transformed to long format?
Let’s reshape the pulse dataset with variables drug
=
{smokes,alcohol} and use
= {yes,no}.
pulse %>% pivot_longer(c(smokes,alcohol), names_to = "drug", values_to = "use")
# A tibble: 220 × 13
id name height weight age gender exercise ran pulse1 pulse2 year drug use
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
1 1993_A Bonnie 173 57 18 female moderate sat 86 88 1993 smokes no
2 1993_A Bonnie 173 57 18 female moderate sat 86 88 1993 alcohol yes
3 1993_B Melanie 179 58 19 female moderate ran 82 150 1993 smokes no
4 1993_B Melanie 179 58 19 female moderate ran 82 150 1993 alcohol yes
5 1993_C Consuelo 167 62 18 female high ran 96 176 1993 smokes no
6 1993_C Consuelo 167 62 18 female high ran 96 176 1993 alcohol yes
7 1993_D Travis 195 84 18 male high sat 71 73 1993 smokes no
8 1993_D Travis 195 84 18 male high sat 71 73 1993 alcohol yes
9 1993_E Lauri 173 64 18 female low sat 90 88 1993 smokes no
10 1993_E Lauri 173 64 18 female low sat 90 88 1993 alcohol yes
# ℹ 210 more rows
pivot_wider
: long to wide formatdfWide <- dfLong %>% pivot_wider(names_from = "pulse", values_from = "level")
dfWide
# A tibble: 3 × 3
name pulse1 pulse2
<chr> <dbl> <dbl>
1 Bonnie 86 88
2 Melanie 82 150
3 Consuelo 96 176
Below, pulses tibble has been transformed into pulses2. What can you say about this transformation?
pulses2 <- pulses %>%
pivot_longer(!name, names_to = "pulse", values_to = "level") %>%
pivot_wider(names_from = "pulse", values_from = "level")
Answer
Often you will need to reshape your data for ggplot visualisations.
For example we would like to compare pulse1
and
pulse2
variables with a boxplot
. You may plot
pulse1 and pulse2 in separate plots one after the other:
require(gridExtra) # Make sure gridExtra is installed with install.packages("gridExtra").
bp1 <- ggplot(pulse %>% drop_na()) + # remove missing
aes(y=pulse1) +
geom_boxplot()
bp2 <- ggplot(pulse %>% drop_na()) + # remove missing
aes(y=pulse2) +
geom_boxplot()
grid.arrange(bp1, bp2, ncol=2)
As you can see the y-scale of each plot is set dynamically and can be
misleading. To resolve this, we would need a single plot with aesthetic
mapping of x
being the categorical variable
pulse
and y
the pulse values as was shown
above.
ggplot(pulse %>%
drop_na() %>% # remove missing
pivot_longer( c(pulse1,pulse2), names_to = "pulse", values_to = "level")
) +
aes(x=pulse, y=level) +
geom_boxplot()
Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC