30 min approx
pivot_*
, separate
, unite
function from the tidyr package in the Tidyverse to reshape data into tidy one.)There are three interrelated rules that make a dataset tidy:
Example: tidyverse::billboard
dataset.1
tidyr::pivot_longer
[Optional]Important
tidyr::pivot_longer
convert your data in “longer” formatcols
: select which variable should be pivotingnames_to
: define the column hosting the cols
colnamesvalues_to
: define the column hosting the cols
valuesWarning
Many possibly uninformative missing information!
tidyr::pivot_longer
[Optional]Important
tidyr::pivot_longer
convert your data in “longer” formatcols
: select which variable should be pivotingnames_to
: define the column hosting the cols
colnamesvalues_to
: define the column hosting the cols
valuesvalues_drop_na
: decide if rows with missing information in values should be removedvar1:var10
: variables lying between var1 on the left and var10 on the right.
starts_with("a")
: names that start with “a”.
ends_with("z")
: names that end with “z”.
contains("b")
: names that contain “b”.
matches("x.y")
: names that match regular expression x.y
. 2
num_range(x, 1:4)
: names following the pattern, x1
, x2
, …, x4
.
all_of(vars)/any_of(vars)
: names stored in the character vector vars. all_of(vars)
will error if the variables aren’t present; any_of(var)
will match just the variables that exist.
everything()
: all variables.
last_col()
: furthest column on the right.
where(is.numeric)
: all variables where is.numeric() returns TRUE.
Tip
!selection
: only variables that don’t match selection.
selection1 & selection2
: only variables included in both selection1 and selection2.
selection1 | selection2
: all variables that match either selection1 or selection2
Tip
In case of multiple variable in each colname, you can pivoting them maintaining the underling structure. This way you can separate them in a further second step using tidyr::separate
.
tidyr::pivot_wider
[Optional]tidyr::pivot_wider
[1] TRUE
Your turn
Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) Ex 15-16
Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio)
pivot_longer
?pivot_wider
?names_from
names_to
values_from
values_to
09-pivot_longer.R
and 10-pivot_wider.R
, and follow the instruction.25:00
Important
To transform a table to a longer one, you need to put some of its columns names_to
a new column, and their corresponding values_to
another one! Possibly allowing values_drop_na
.
To transform a table to a wider one, you need to take new column names_from
an existing column, and their corresponding values_from
the associated one! Possibly with created missing values_fill
ed.
YOU: Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) and write there questions & doubts (and if I am too slow or too fast)
ME: Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio): script 11-pivoting.R
dplyr
- introCommon structure:
Tip
All verbs in Tidyverse are designed to do one thing mainly, and to it well! So, to solve complex problem we will often combine multiple verbs, and we use the pipe (|>
) as we are already familiar!
dplyr::filter
[side-by-side]Important
dplyr::filter
allows you to keep rows based on the values of the columns.
We can use any kind of condition inside dplyr::filter
; e.g.,
We can use any kind of condition inside dplyr::filter
; e.g.,
We can use any kind of condition inside dplyr::filter
; e.g.,
We can use any kind of condition inside dplyr::filter
; e.g.,
We can also combine together multiple condition of arbitrary complexity at once
Tip
It could be difficult to remind the priority order of logical operators. Using parentheses to group each conditions is a safe way to not be wrong!
dplyr::select
[side-by-side]For analyses, you do not need to remove columns from your dataset, but it could be extremely useful to see more clearly only the data you need to see time to time.1
You can select the column to keep using the dplyr::select()
verb providing:
dplyr::select
[side-by-side]For analyses, you do not need to remove columns from your dataset, but it could be extremely useful to see more clearly only the data you need to see time to time.1
You can select the column to keep using the dplyr::select()
verb providing:
dplyr::select
[side-by-side]For analyses, you do not need to remove columns from your dataset, but it could be extremely useful to see more clearly only the data you need to see time to time.1
You can select the column to keep using the dplyr::select()
verb providing:
!
)dplyr::select
[side-by-side]For analyses, you do not need to remove columns from your dataset, but it could be extremely useful to see more clearly only the data you need to see time to time.1
You can select the column to keep using the dplyr::select()
verb providing:
where
var1:var10
: variables lying between var1 on the left and var10 on the right.
starts_with("a")
: names that start with “a”.
ends_with("z")
: names that end with “z”.
contains("b")
: names that contain “b”.
matches("x.y")
: names that match regular expression x.y
. 2
num_range(x, 1:4)
: names following the pattern, x1
, x2
, …, x4
.
all_of(vars)/any_of(vars)
: names stored in the character vector vars. all_of(vars)
will error if the variables aren’t present; any_of(var)
will match just the variables that exist.
everything()
: all variables.
last_col()
: furthest column on the right.
where(is.numeric)
: all variables where is.numeric() returns TRUE.
Tip
!selection
: only variables that don’t match selection.
selection1 & selection2
: only variables included in both selection1 and selection2.
selection1 | selection2
: all variables that match either selection1 or selection2
Your turn
Connect to our pad(https://bit.ly/ubep-rws-pad-ed3)
Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio)
Answer in the pad, under the section 3.2. Ex17
, and 3.2. Ex18
.
Then, open the script 11-filter.R
and 12-select.R
, and follow the instruction.
20:00
Important
dplyr::filter
and dplyr::select
is always a data framedplyr::filter
and dplyr::select
is always a data framedplyr::filter
nor dplyr::select
, never!Important
all_of(vec)
is for strict selection. If any of the variables in the character vec
is missing, an error is thrown.any_of(vec)
doesn’t check for missing variables. It is especially useful with negative selections, when you would like to make sure a variable is removed.YOU: Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) and write there questions & doubts (and if I am too slow or too fast)
ME: Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio): script 12-filter-and-select.R
To create the current lesson, we explored, used, and adapted content from the following resources:
The slides are made using Posit’s Quarto open-source scientific and technical publishing system powered in R by Yihui Xie’s Kintr.
This work by Corrado Lanera, Ileana Baldi, and Dario Gregori is licensed under CC BY 4.0
UBEP’s R training for supervisors