Day Four:
Finale

50 min approx

Global recap - 10’

Base R

#: comments
name <- object : assignment
funs(arg = value) : function call
?fun : help
?"+" : help on operator
pkg::fun : function from package
?pkg::fun : help on function from package
\(x) x + 1 : anonymous function definition
(\(x) x + 1)() : anonymous function call
f(x, y) == (x |> f(y)) : pipe

math
logic
base types: logical < integer < numeric < character
vectors: same types only
vector other types: factors, dates, times
lists : different types
list other types: data frames, tibbles

Tidyverse

library(tidyverse) wrapper for many packages:
- library(tidyr): reshape to tidy (pivot_longer, pivot_wider, separate)
- library(dplyr): data wrangling (filter, select, mutate, group_by, summarise)
- library(forcats) : factors manipulation (fct_*)
- library(lubridate) : dates manipulation (ymd, mdy, dmy, ymd_hms)
- library(stringr): string manipulation (str_*)
- library(ggplot2): data visualization

Tidy principles:

design for humans
pipes: “and then… and then…”
tidy data
functions are verbs
coherent grammar

Project organizatin

activate RStudio project
here::here(): project root
here::here("data-raw"): raw data
here::here("data"): preprocessed data
here::here("output"): output
here::here("R/functions.R"): script with custom functions
here::here("analyses"): analyses
load custom functions with: source(here("R/functions.R"))

Import

(base): read.csv
(tidyverse): read_csv
(heven): read_sas, read_spss, read_stata
rio::import : import from many formats
rio::export : export to many formats

Cleaning

janitor::clean_names: clean column names
janitor::remove_empty: remove empty rows and columns
(tidyr::fill): Fills missing values in selected columns using the next or previous entry
(tidyr::drop_na): drop rows with missing values in selected columns
(unheader::mash_colnames): combine header rows (including grouping headers with sliding_headers = TRUE)

library(rio)
library(here)
library(unheadr)
library(janitor)
library(tidyverse)
options(rio.import.class = "tibble")

here(
  "data-raw",
  "Copenhagen_raw.xlsx"
) |> 
  import(
    header = FALSE,
    na = c("", "??")
  ) |> 
  mash_colnames(
    keep_names = FALSE,
    n_name_rows = 4,
    sliding_headers = TRUE
  ) |> 
  clean_names() |> 
  remove_empty(c("rows", "cols")) |> 
  fill(demo_sex)

Data wrangling

dplyr::filter: filter rows
dplyr::select: select columns
dplyr::mutate: add/change columns (does not affect number of rows)
dplyr::summarise: summarize groups (resulting in one row per group tibble/data.frame)
.by = <tidy-select>: group rows by one or more column unique values; used both in mutate and summarise.

Caution

Summary functions (e.g., min, max):

Takes: vectors.
Returns: a single value.

Vectorized functions (e.g., pmin, pmax):

Takes: vectors.
Returns: vectors (the same length as the input).

Summary tables

gtsummary::tbl_summary: summary table
gtsummary::tbl_cross: cross table
(gtsummary::tbl_uvregression): regression table
(gtsummary::tbl_merge): merge tables (horizontally)
(gtsummary::tbl_stack): stack tables (vertically)
gt::gtsave(as_gt(<tbl>), "my_tbl.<ext>"): save table as <ext> file (e.g., html, png, pdf, docx, …)

trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    type = list(
      response ~ "categorical"
    ),
    percent = "row",
    digits = list(
      age ~ 2
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      response ~ "{n} ({p}%)"
    )
  ) |> 
  add_n() |>
  add_overall() |> 
  add_p() |> 
  bold_p(t = 0.6) |>
  bold_levels() |>
  bold_labels() |> 
  italicize_levels() |> 
  italicize_labels()

Data visualization

p <- <DATA> |> 
  ggplot(
    aes(<GLOBAL_MAPPINGS>)
  ) + 
    <GEOM_FUNCTION>(
      aes(<LOCAL_MAPPINGS>),
      position = <LOCAL_POSITION>,
      <AESTHETIC> = <LOCAL_CONSTANT>
    ) +
    <SCALE_FUNCTION> +
    <FACET_FUNCTION> +
    labs(
      ## aesthetics
      <AES_NAME> = "<TEXT>",
      
      ## meta-data
      <METADATA_NAME> = "<TEXT>"
    ) +
    <THEME>()
    
p
ggsave("my_plot.png") # last printet plot
ggsave("my_plot.jpeg", p) # specific plot

Next month(s) assessment - 10’

One month after the end of all the three editions of the course, we will upload a new complete assessment to the website to permit you to check long-term retention of the concepts and skills learned during the course.

Survey - 10’

R local installation support - 10’

Thank you!

Acknowledgment

The slides are made using Posit’s Quarto open-source scientific and technical publishing system powered in R by Yihui Xie’s Kintr.

License

This work by Corrado Lanera, Ileana Baldi, and Dario Gregori is licensed under CC BY 4.0

Day Four:Finale

Global recap - 10’

Base R

Tidyverse

Project organizatin

Import

Cleaning

Data wrangling

Summary tables

Data visualization

Next month(s) assessment - 10’

Survey - 10’

R local installation support - 10’

Thank you!

Acknowledgment

License

Day Four:
Finale