Agenda
Intro
The course will alternate between face-to-face presentations and exercises (15-20 minute each) on the covered topics. Next, the instructor will provide the “official” answer to the challenge. That not only offers a comparative learning experience but also sets the stage for the next topic. Each new chapter begins in a collaborative environment, with the official solution to the previous challenge implemented, ensuring a smooth transition and cohesive learning path.
At the end of each day, participants will have the option to tackle an longer hands-on exercise covering the day’s topics; the solution will be discussed and presented in the first half-hour of the next class.
At the end of the course, we will suggest a comprehensive exercise to be done independently at home. We will also send a questionnaire to participants to assess the retention of the concepts presented in the course.
The whole course will be conducted using the R programming languages (R Core Team 2024), and it will introduce the main Tidyverse (Wickham et al. 2019) R-package ecosystem and philosophy for a uniform, composable, functional, and human-first design utilities to do data science. In particular, the following packages will be introduced and used during the course:
the
{tidyverse}
(Wickham 2023c) to activate the ecosystem on the R sessionthe RStudio projects, and the
{renv}
(Ushey and Wickham 2023), and{here}
(Müller 2020) packages for project management, reproducibility, and portability. On their side, there will be introduced the native pipe, with only mentioning the {magrittr} [@R-magrittr], to pipe instructions in an easy-to-write/read/understand way.{ggplot2}
(Wickham, Chang, et al. 2023) for graphics productionthe
{rio}
(Becker et al. 2023) for data reading/import and writing/export{janitor}
(Firke 2023), and{tidyr}
(Wickham, Vaughan, and Girlich 2023) for data cleaning at import level, and the{dplyr}
(Wickham, François, et al. 2023),{forcats}
(Wickham 2023a),{lubridate}
(Spinu, Grolemund, and Wickham 2023), and (optionally at teh end of teh course){stringr}
(Wickham 2023b), and{glue}
(Hester and Bryan 2022) for post-import data sets and type-specific data manipulation and management.{gtsummary}
(Sjoberg et al. 2023) to create summary tables for data and models and to include table data in a report’s narrative text sections directly, i.e., without copy-pasting them by hand.1
- R Markdown (Allaire et al. 2023),
{knitr}
(Xie 2023), and Quarto (Allaire 2023) to create reproducible, dynamic documents such as reports, articles, slides, and much more.
Agenda
[60’] Intros:
Course’s objectives and philosophy
Ts/TAs presentation
Course organization, teaching materials, and personalized assistance
R/RStudio
Posit RStudio Cloud IDE presentation and setup
EXERCISE [15’]
[20’] Basic R and RStudio:
R basics
Packages
Tidyverse
EXERCISE [15’]
SOLUTION [5’]
[30’] Infrastructures2
R projects
{here}
Files organization (dev/ folder)
EXERCISE [20’]
SOLUTION [5’]
[20’] Import and cleaning
EXERCISE [15’]
SOLUTION [5’]
[40’] local environments
{renv}
EXERCISE [25’]
SOLUTION [5’]
[30’] R Data Structures
Base data structures
Subsetting and Extractions
EXERCISE [25’]
SOLUTION [5’]
[30’] Pipe
- Pipe
EXERCISE [20’]
SOLUTION [5’]
[30’] (cleaning (Optional if time allows))
- Headers, variables’ names, and missing data
EXERCISE [25’]
SOLUTION [5’]
[30’] Transform shape of dataset 5 6
Tidy format
select
(all_of…)filter
EXERCISE [15’]
SOLUTION [5’]
[30’] Transform – manage dataset7
mutate (across) data contents row-by-row
mutate and summarize data by groups
EXERCISE [25’]
SOLUTION [5’]
[50’] Modeling (summary statistics tables)8
Summary tables
- descriptive statistics
EXERCISE [25’]
SOLUTION [5’]
- cross-tables
- saving tables
EXERCISE [20’]
SOLUTION [5’]
[85’] Visualization 9
Intro to
{ggplot2}
Tidy data, and the layered grammar of graphics.
Base template (data, aesthetics, and geometries)
EXERCISE [20’]
SOLUTION [5’]
[10’] Visualization 10
Intro to
{ggplot2}
- Scales, Facets, Labels, and Themes
- Saving plots
[30’] Transform – manage main types11
factors
dates/datetimes
(strings (optional if time allows))