Agenda

Authors
Affiliation

Corrado Lanera

Ileana Baldi

Dario Gregori

Intro

The course will alternate between face-to-face presentations and exercises (15-20 minute each) on the covered topics. Next, the instructor will provide the “official” answer to the challenge. That not only offers a comparative learning experience but also sets the stage for the next topic. Each new chapter begins in a collaborative environment, with the official solution to the previous challenge implemented, ensuring a smooth transition and cohesive learning path.

At the end of each day, participants will have the option to tackle an longer hands-on exercise covering the day’s topics; the solution will be discussed and presented in the first half-hour of the next class.

At the end of the course, we will suggest a comprehensive exercise to be done independently at home. We will also send a questionnaire to participants to assess the retention of the concepts presented in the course.

The whole course will be conducted using the R programming languages (R Core Team 2024), and it will introduce the main Tidyverse (Wickham et al. 2019) R-package ecosystem and philosophy for a uniform, composable, functional, and human-first design utilities to do data science. In particular, the following packages will be introduced and used during the course:

Agenda

240’ each day

[60’] Intros:

  • Course’s objectives and philosophy

  • Ts/TAs presentation

  • Course organization, teaching materials, and personalized assistance

  • R/RStudio

  • Posit RStudio Cloud IDE presentation and setup

EXERCISE [15’]

[10’] BREAK

[20’] Basic R and RStudio:

  • R basics

  • Packages

  • Tidyverse

EXERCISE [15’]

SOLUTION [5’]

[30’] Infrastructures2

  • R projects

  • {here}

  • Files organization (dev/ folder)

EXERCISE [20’]

SOLUTION [5’]

[10’] BREAK

[20’] Import and cleaning

EXERCISE [15’]

SOLUTION [5’]

[10’] Recap & Assignments

[10’] Recap & Solutions

[40’] local environments

  • {renv}

EXERCISE [25’]

SOLUTION [5’]

[10’] BREAK

[30’] R Data Structures

  • Base data structures

  • Subsetting and Extractions

EXERCISE [25’]

SOLUTION [5’]

[30’] Pipe

  • Pipe

EXERCISE [20’]

SOLUTION [5’]

[10’] BREAK

[30’] (cleaning (Optional if time allows))

  • Headers, variables’ names, and missing data

EXERCISE [25’]

SOLUTION [5’]

[10’] Recap & Assignments

[10’] Recap & Solutions

[30’] Transform shape of dataset 5 6

  • Tidy format

  • select (all_of…)

  • filter

EXERCISE [15’]

SOLUTION [5’]

[10’] BREAK

[30’] Transform – manage dataset7

  • mutate (across) data contents row-by-row

  • mutate and summarize data by groups

[10’] BREAK

EXERCISE [25’]

SOLUTION [5’]

[50’] Modeling (summary statistics tables)8

  • Summary tables

    • descriptive statistics

EXERCISE [25’]

SOLUTION [5’]

- cross-tables
- saving tables

EXERCISE [20’]

SOLUTION [5’]

[10’] RECAP & assignments

[10’] RECAP & solutions

[85’] Visualization 9

  • Intro to {ggplot2}

    • Tidy data, and the layered grammar of graphics.

    • Base template (data, aesthetics, and geometries)

EXERCISE [20’]

SOLUTION [5’]

[10’] BREAK

[10’] Visualization 10

  • Intro to {ggplot2}

    • Scales, Facets, Labels, and Themes
    • Saving plots

[10’] BREAK

[30’] Transform – manage main types11

  • factors

  • dates/datetimes

  • (strings (optional if time allows))

[10’] BREAK

[10’] RECAP & final assignments

[25’] finale w/ OVERALL RECAP, next-month assignment, support access instructions, and final survey

Back to top

Bibliography

Allaire, JJ. 2023. Quarto: R Interface to Quarto Markdown Publishing System. https://github.com/quarto-dev/quarto-r.
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2023. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Becker, Jason, Chung-hong Chan, David Schoch, and Thomas J. Leeper. 2023. Rio: A Swiss-Army Knife for Data i/o. https://github.com/gesistsa/rio.
Firke, Sam. 2023. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://github.com/sfirke/janitor.
Gohel, David, and Panagiotis Skintzos. 2023. Flextable: Functions for Tabular Reporting. https://ardata-fr.github.io/flextable-book/.
Hester, Jim, and Jennifer Bryan. 2022. Glue: Interpreted String Literals. https://github.com/tidyverse/glue.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sjoberg, Daniel D., Joseph Larmarange, Michael Curry, Jessica Lavery, Karissa Whiting, and Emily C. Zabor. 2023. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://github.com/ddsjoberg/gtsummary.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2023. Lubridate: Make Dealing with Dates a Little Easier. https://lubridate.tidyverse.org.
Ushey, Kevin, and Hadley Wickham. 2023. Renv: Project Environments. https://rstudio.github.io/renv/.
Wickham, Hadley. 2023a. Forcats: Tools for Working with Categorical Variables (Factors). https://forcats.tidyverse.org/.
———. 2023b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org.
———. 2023c. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://tidyr.tidyverse.org.
Xie, Yihui. 2023. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.