Day Four:
Finale

50 min approx

Global recap - 10’

Base R

  • #: comments
  • name <- object : assignment
  • funs(arg = value) : function call
  • ?fun : help
  • ?"+" : help on operator
  • pkg::fun : function from package
  • ?pkg::fun : help on function from package
  • \(x) x + 1 : anonymous function definition
  • (\(x) x + 1)() : anonymous function call
  • f(x, y) == (x |> f(y)) : pipe
  • math
  • logic
  • base types: logical < integer < numeric < character
  • vectors: same types only
  • vector other types: factors, dates, times
  • lists : different types
  • list other types: data frames, tibbles

Tidyverse

Tidyverse loop

Tidy principles:

  • design for humans
  • pipes: “and then… and then…”
  • tidy data
  • functions are verbs
  • coherent grammar

Project organizatin

  • activate RStudio project
  • here::here(): project root
  • here::here("data-raw"): raw data
  • here::here("data"): preprocessed data
  • here::here("output"): output
  • here::here("R/functions.R"): script with custom functions
  • here::here("analyses"): analyses
  • load custom functions with: source(here("R/functions.R"))

Import

  • (base): read.csv
  • (tidyverse): read_csv
  • (heven): read_sas, read_spss, read_stata
  • rio::import : import from many formats
  • rio::export : export to many formats

Cleaning

  • janitor::clean_names: clean column names
  • janitor::remove_empty: remove empty rows and columns
  • (tidyr::fill): Fills missing values in selected columns using the next or previous entry
  • (tidyr::drop_na): drop rows with missing values in selected columns
  • (unheader::mash_colnames): combine header rows (including grouping headers with sliding_headers = TRUE)
library(rio)
library(here)
library(unheadr)
library(janitor)
library(tidyverse)
options(rio.import.class = "tibble")

here(
  "data-raw",
  "Copenhagen_raw.xlsx"
) |> 
  import(
    header = FALSE,
    na = c("", "??")
  ) |> 
  mash_colnames(
    keep_names = FALSE,
    n_name_rows = 4,
    sliding_headers = TRUE
  ) |> 
  clean_names() |> 
  remove_empty(c("rows", "cols")) |> 
  fill(demo_sex)

Data wrangling

  • dplyr::filter: filter rows
  • dplyr::select: select columns
  • dplyr::mutate: add/change columns (does not affect number of rows)
  • dplyr::summarise: summarize groups (resulting in one row per group tibble/data.frame)
  • .by = <tidy-select>: group rows by one or more column unique values; used both in mutate and summarise.

Caution

Summary functions (e.g., min, max):

  • Takes: vectors.
  • Returns: a single value.

Vectorized functions (e.g., pmin, pmax):

  • Takes: vectors.
  • Returns: vectors (the same length as the input).

Summary tables

trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    type = list(
      response ~ "categorical"
    ),
    percent = "row",
    digits = list(
      age ~ 2
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      response ~ "{n} ({p}%)"
    )
  ) |> 
  add_n() |>
  add_overall() |> 
  add_p() |> 
  bold_p(t = 0.6) |>
  bold_levels() |>
  bold_labels() |> 
  italicize_levels() |> 
  italicize_labels()

Data visualization

p <- <DATA> |> 
  ggplot(
    aes(<GLOBAL_MAPPINGS>)
  ) + 
    <GEOM_FUNCTION>(
      aes(<LOCAL_MAPPINGS>),
      position = <LOCAL_POSITION>,
      <AESTHETIC> = <LOCAL_CONSTANT>
    ) +
    <SCALE_FUNCTION> +
    <FACET_FUNCTION> +
    labs(
      ## aesthetics
      <AES_NAME> = "<TEXT>",
      
      ## meta-data
      <METADATA_NAME> = "<TEXT>"
    ) +
    <THEME>()
    
p
ggsave("my_plot.png") # last printet plot
ggsave("my_plot.jpeg", p) # specific plot

Next month(s) assessment - 10’

One month after the end of all the three editions of the course, we will upload a new complete assessment to the website to permit you to check long-term retention of the concepts and skills learned during the course.

Survey - 10’

R local installation support - 10’

Thank you!

Acknowledgment

The slides are made using Posit’s Quarto open-source scientific and technical publishing system powered in R by Yihui Xie’s Kintr.

License

This work by Corrado Lanera, Ileana Baldi, and Dario Gregori is licensed under CC BY 4.0