Day Four:
Summary tables

~50 min

Overview

Questions

What are summary tables, and how are they used in data analysis?
How does gtsummary facilitate the presentation of descriptive statistics?
How can one create cross-tables using gtsummary?
How can one create tables for univariable models using gtsummary?
How does one merge (horizontally) and stack (vertically) tables using gtsummary?

Lesson Objectives

To be able to

Generate summary tables using gtsummary (using the tbl_summary() function)
Customize summary tables using gtsummary (using the basic extended functions, such as add_*, and bold_*)
Create cross-tables (tables showing relationships between variables) using gtsummary (using the tbl_cross() function)
Merge and stack summary tables using gtsummary in R (using the tbl_merge() and tbl_stack() functions)
Create and interpret tables summarizing univariable model results
Merge (h) and stuck (v) multiple tables to compare or combine data

Summary tables

What are summary tables?

A way to present descriptive statistics and results of statistical models in a tabular format.

As for ggplot for graphs, gtsummary is a package that facilitates the creation of summary tables in R. It is based on the gt package, which is a “grammar of tables” and it follows the same philosophy of ggplot2.

Descriptive statistic - summarize data

`tbl_summary()` - base function

The tbl_summary() function is used to summarize data in a table format.
It calculates descriptive statistics for continuous, categorical, and dichotomous variables in R, and presents the results in a beautiful, customizable summary table ready for publication (or presentation).
It adopts the tidyverse syntax to summarize data:
- it uses the pipe operator |> to chain functions together
- it uses the same selectors to select variables as the dplyr::select() function, e.g., starts_with(), ends_with(), contains(), matches(), everything(), etc.

`tbl_summary()` - Main arguments

data - the data frame to summarize (which can be piped in)
include - the variables to include in the table, i.e., the “variable” column of the table
by - the variable(s) to stratify the table by, i.e., the variable(s) that define the summary columns of the table. Summary statistics will be calculated separately for each level of the by variable (e.g. by = trt).

Tip

if include is not specified, all variables in the data frame will be included in the table
if by is not specified, the table will not be stratified, and summary statistics will be calculated for all variables in the data frame

`tbl_summary()` - base [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary()
tbl

Tip

use no arguments to summarize all variables in the data frame

Characteristic	N = 200¹
Chemotherapy Treatment
Drug A	98 (49%)
Drug B	102 (51%)
Age	47 (38, 57)
Unknown	11
Marker Level (ng/mL)	0.64 (0.22, 1.39)
Unknown	10
T Stage
T1	53 (27%)
T2	54 (27%)
T3	43 (22%)
T4	50 (25%)
Grade
I	68 (34%)
II	68 (34%)
III	64 (32%)
Tumor Response	61 (32%)
Unknown	7
Patient Died	112 (56%)
Months to Death/Censor	22.4 (16.0, 24.0)
¹ n (%); Median (IQR)

`tbl_summary()` - strata [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt #<<
  )
tbl

Tip

use the by argument to stratify the table by a variable

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age	46 (37, 59)	48 (39, 56)
Unknown	7	4
Marker Level (ng/mL)	0.84 (0.24, 1.57)	0.52 (0.19, 1.20)
Unknown	6	4
T Stage
T1	28 (29%)	25 (25%)
T2	25 (26%)	29 (28%)
T3	22 (22%)	21 (21%)
T4	23 (23%)	27 (26%)
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
Patient Died	52 (53%)	60 (59%)
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.6, 24.0)
¹ Median (IQR); n (%)

`tbl_summary()` - variables I [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  select(trt, age, grade, response) |> #<<
  tbl_summary(
    by = trt
  )
tbl

Tip

to select variable of interest, we can pipe in a select() function, or…

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age	46 (37, 59)	48 (39, 56)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
¹ Median (IQR); n (%)

`tbl_summary()` - variables II [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response) #<<
  )
tbl

Tip

use the include argument to select variables of interest

Important

Detects variable types of input data and calculates proper descriptive statistics
Variables coded as 0/1, TRUE/FALSE, and Yes/No are presented dichotomously
Recognizes NA values as “missing” and lists them as unknown
Label attributes automatically printed
Variable levels indented and footnotes added

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age	46 (37, 59)	48 (39, 56)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
¹ Median (IQR); n (%)

`tbl_summary()` - types [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list( #<<
      response ~ "categorical" #<<
    ) #<<
  )
tbl

Tip

use the type argument to specify the variable types, e.g., to change the default behavior for dichotomous variables to be treated as standard categorical (i.e., showing each level on a separate row)

Important

Syntax is variable ~ "type"

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age	46 (37, 59)	48 (39, 56)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response
0	67 (71%)	65 (66%)
1	28 (29%)	33 (34%)
Unknown	3	4
¹ Median (IQR); n (%)

Important

Detects variable types of input data and calculates proper descriptive statistics
Variables coded as 0/1, TRUE/FALSE, and Yes/No are presented dichotomously
Recognizes NA values as “missing” and lists them as unknown
Label attributes automatically printed
Variable levels indented and footnotes added

`tbl_summary()` - percent [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list(
      response ~ "categorical"
    ),
    percent = "row" #<<
  )
tbl

Tip

use the percent argument to calculate percentages within strata:
- percent = "column" calculates percentages within columns
- percent = "row" calculates percentages within rows
- percent = "cell" calculates percentages within the overall variable’s cells

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age	46 (37, 59)	48 (39, 56)
Unknown	7	4
Grade
I	35 (51%)	33 (49%)
II	32 (47%)	36 (53%)
III	31 (48%)	33 (52%)
Tumor Response
0	67 (51%)	65 (49%)
1	28 (46%)	33 (54%)
Unknown	3	4
¹ Median (IQR); n (%)

`tbl_summary()` - labels [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list(
      response ~ "categorical"
    ),
    label = list(  #<<
      age ~ "Age (years)", #<<
      grade ~ "Grade", #<<
      response ~ "Response" #<<
    ), #<<
    percent = "row"
  )
tbl

Tip

use the label argument to change the variable labels

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age (years)	46 (37, 59)	48 (39, 56)
Unknown	7	4
Grade
I	35 (51%)	33 (49%)
II	32 (47%)	36 (53%)
III	31 (48%)	33 (52%)
Response
0	67 (51%)	65 (49%)
1	28 (46%)	33 (54%)
Unknown	3	4
¹ Median (IQR); n (%)

`tbl_summary()` - digits [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list(
      response ~ "categorical"
    ),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    percent = "row",
    digits = list(
      age ~ 2 #<<
    )
  )
tbl

Tip

use the digits argument to change the number of digits shown for continuous variables

Important

Syntax is variable ~ digits

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age (years)	46.00 (37.00, 59.00)	48.00 (39.00, 56.00)
Unknown	7	4
Grade
I	35 (51%)	33 (49%)
II	32 (47%)	36 (53%)
III	31 (48%)	33 (52%)
Response
0	67 (51%)	65 (49%)
1	28 (46%)	33 (54%)
Unknown	3	4
¹ Median (IQR); n (%)

`tbl_summary()` - statistics [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list(
      response ~ "categorical"
    ),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    percent = "row",
    digits = list(
      age ~ 2
    ),
    statistic = list(  #<<
      all_continuous() ~ "{mean} ({sd})",  #<<
      response ~ "{n} ({p}%)"  #<<
    )  #<<
  )
tbl

Tip

use the statistic argument to change the summary statistics

Important

Syntax is list(variable ~ "statistic")

Characteristic	Drug A, N = 98¹	Drug B, N = 102¹
Age (years)	47.01 (14.71)	47.45 (14.01)
Unknown	7	4
Grade
I	35 (51%)	33 (49%)
II	32 (47%)	36 (53%)
III	31 (48%)	33 (52%)
Response
0	67 (51%)	65 (49%)
1	28 (46%)	33 (54%)
Unknown	3	4
¹ Mean (SD); n (%)

Statisitcs overview

For categorical variables the following statistics are available to display.

{n}: frequency
{N}: denominator, or cohort size
{p}: formatted percentage

For continuous variables the following statistics are available to display.

{median}: median
{mean}: mean
{sd}: standard deviation
{var}: variance
{min}: minimum
{max}: maximum
{sum}: sum
{p##}:⁠ any integer percentile, where ⁠##⁠ is an integer from 0 to 100
{foo}: any function of the form foo(x) is accepted where x is a numeric vector

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.

{N_obs}: total number of observations
{N_miss}: number of missing observations
{N_nonmiss}: number of non-missing observations
{p_miss}: percentage of observations missing
{p_nonmiss}: percentage of observations not missing

`tbl_summary()` - adders [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    type = list(
      response ~ "categorical"
    ),
    percent = "row",
    digits = list(
      age ~ 2
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      response ~ "{n} ({p}%)"
    )
  ) |> 
  add_n() |>
  add_overall() |> 
  add_p()
tbl

Tip

use the add_* functions to add summary statistics to the table

Characteristic	N	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹	p-value²
Age (years)	189	47.24 (14.31)	47.01 (14.71)	47.45 (14.01)	0.7
Unknown		11	7	4
Grade	200				0.9
I		68 (100%)	35 (51%)	33 (49%)
II		68 (100%)	32 (47%)	36 (53%)
III		64 (100%)	31 (48%)	33 (52%)
Response	193				0.5
0		132 (100%)	67 (51%)	65 (49%)
1		61 (100%)	28 (46%)	33 (54%)
Unknown		7	3	4
¹ Mean (SD); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

`tbl_summary()` - stylers [side-by-side]

library(tidyverse)
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    label = list(
      age ~ "Age (years)",
      grade ~ "Grade",
      response ~ "Response"
    ),
    type = list(
      response ~ "categorical"
    ),
    percent = "row",
    digits = list(
      age ~ 2
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      response ~ "{n} ({p}%)"
    )
  ) |> 
  add_n() |>
  add_overall() |> 
  add_p() |> 
  bold_p(t = 0.6) |>
  bold_levels() |>
  bold_labels() |> 
  italicize_levels() |> 
  italicize_labels()
tbl

Tip

use the bold_* and italicize_* functions to customize the table

Important

bold_p bolds p-values less than a specified threshold

Characteristic	N	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹	p-value²
Age (years)	189	47.24 (14.31)	47.01 (14.71)	47.45 (14.01)	0.7
Unknown		11	7	4
Grade	200				0.9
I		68 (100%)	35 (51%)	33 (49%)
II		68 (100%)	32 (47%)	36 (53%)
III		64 (100%)	31 (48%)	33 (52%)
Response	193				0.5
0		132 (100%)	67 (51%)	65 (49%)
1		61 (100%)	28 (46%)	33 (54%)
Unknown		7	3	4
¹ Mean (SD); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Your turn (main: C; bk1: A; bk2: B)

Your turn

Connect to our pad (https://bit.ly/ubep-rws-pad-ed3)
Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio)

Under the section 3.3. Ex20 Ex21, answer to the questions with an x next to the correct answer.
Open the script 14-tbl_summary.R, and follow the instruction step by step.

25:00

Important

include selects the variables to include in the table
by stratifies the table by a variable
you can use tidyverse selectors to select variables to include in the table, i.e., as the include argument of tbl_summary

My turn

YOU: Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) and write there questions & doubts (and if I am too slow or too fast)

ME: Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio): script 17-gtsummary.R

Cross-tables - summarize relationships

`tbl_cross()`

The tbl_cross() function is used to summarize relationships between variables in a table format.

library(tidyverse)
library(gtsummary)

trial |> 
  tbl_cross(
    row = trt,
    col = grade
  )

	Grade			Total
	I	II	III	Total
Chemotherapy Treatment
Drug A	35	32	31	98
Drug B	33	36	33	102
Total	68	68	64	200

`tbl_cross()`

The tbl_cross() function is used to summarize relationships between variables in a table format.

library(tidyverse)
library(gtsummary)

trial |> 
  tbl_cross(
    row = trt,
    col = grade,
    percent = "row", 
  ) |> 
  add_p()

	Grade			Total	p-value¹
	I	II	III	Total	p-value¹
Chemotherapy Treatment					0.9
Drug A	35 (36%)	32 (33%)	31 (32%)	98 (100%)
Drug B	33 (32%)	36 (35%)	33 (32%)	102 (100%)
Total	68 (34%)	68 (34%)	64 (32%)	200 (100%)
¹ Pearson’s Chi-squared test

Tip

tbl_cross can use the same arguments, adders, and stylers as tbl_summary.

`gtsave` [side-by-side]

The gtsave() function is used to save gtsummary tables as images or HTML files.

library(tidyverse)  
library(gtsummary)

tbl <- trial |> 
  tbl_summary(
    by = trt,
    include = c(trt, age, grade, response),
    type = list(response ~ "categorical")
  )

as_gt(tbl) |> 
  gt::gtsave("trial.png")

as_gt(tbl) |> 
  gt::gtsave("trial.pdf")

as_gt(tbl) |> 
  gt::gtsave("trial.docx")

# ...and so on

Tip

gtsave can save tables as HTML, PNG, JPG, DOCX, RTF, TEX, or PDF files; just set the file extension in the filename argument accordingly.

Your turn (main: A; bk1: B; bk2: C)

Your turn

Connect to our pad (https://bit.ly/ubep-rws-pad-ed3)
Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio)

Under the section 3.3. Ex 22, answer to the questions with an x next to the correct answer.
Open the script 15-tbl_cross.R, and follow the instruction step by step.

20:00

Important

tbl_cross accept a data frame and two of its variables names as input.
gt::gtsave(as_gt(<tbl>), here("path/to/file.<ext>")) saves table as HTML, PNG, JPG, DOCX, RTF, TEX, or PDF files (depending on the extension ).

My turn

YOU: Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) and write there questions & doubts (and if I am too slow or too fast)

ME: Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio): script 17-gtsummary.R

Break

10:00

Tables for univariable models - multiple regression at once [optional]

`tbl_uvregression()` - base [optional]

The tbl_uvregression() function is used to summarize the results of multiple (one for each covariate included) univariable models at once.

library(tidyverse)
library(gtsummary)

trial |> 
  tbl_uvregression(
    include = c(response, age, grade),
    y = response,
    method = glm,
    method.args = list(family = binomial)
  )

Characteristic	N	log(OR)¹	95% CI¹	p-value
Age	183	0.02	0.00, 0.04	0.10
Grade	193
I		—	—
II		-0.06	-0.81, 0.69	0.9
III		0.09	-0.65, 0.83	0.8
¹ OR = Odds Ratio, CI = Confidence Interval

models function and arguments are specified within the tbl_uvregression() function directly.
Variable types are automatically detected and reference rows are added for categorical variables.
Model estimates and confidence intervals are rounded and formatted.
Variable levels are indented and footnotes added.

`tbl_uvregression()` - autodetect [optional]

The tbl_uvregression() function is used to summarize the results of multiple (one for each covariate included) univariable models at once.

library(tidyverse)
library(gtsummary)

trial |> 
  tbl_uvregression(
    include = c(response, age, grade),
    y = response,
    method = glm,
    method.args = list(family = binomial),
    exponentiate = TRUE
  )

Characteristic	N	OR¹	95% CI¹	p-value
Age	183	1.02	1.00, 1.04	0.10
Grade	193
I		—	—
II		0.95	0.45, 2.00	0.9
III		1.10	0.52, 2.29	0.8
¹ OR = Odds Ratio, CI = Confidence Interval

Common model types detected and appropriate header added with footnote
The model was recognized as logistic regression with coefficients exponentiated, so the header displayed “OR” for odds ratio.

`tbl_uvregression()` - customize

The tbl_uvregression() function is used to summarize the results of multiple (one for each covariate included) univariable models at once.

library(tidyverse)
library(gtsummary)

trial |> 
  tbl_uvregression(
    include = c(response, age, grade),
    y = response,
    method = glm,
    method.args = list(family = binomial),
    exponentiate = TRUE
  ) |> 
  add_nevent() |>
  add_global_p() |>
  # adjusts global p-values for multiple testing
  add_q() |> 
  bold_p() |>
  # bold q-values under the threshold of 0.10
  bold_p(t = 0.10, q = TRUE) |> 
  bold_labels()

Characteristic	N	Event N	OR¹	95% CI¹	p-value	q-value²
Age	183	58	1.02	1.00, 1.04	0.091	0.2
Grade	193	61			>0.9	>0.9
I			—	—
II			0.95	0.45, 2.00
III			1.10	0.52, 2.29
¹ OR = Odds Ratio, CI = Confidence Interval
² False discovery rate correction for multiple testing

tbl_uvregression is a wrapper for tbl_regression, and as a result, accepts nearly identical function arguments.
Differently to tbl_uvregression, tbl_regression accepts a single full model object as input directly instead of a data frame and a model specification.
To explore more details on tbl_regression function, you can find more information in the gtsummary documentation or from R directly with ?tbl_regression.

Merging (h) and stacking (v) tables [optional]

`tbl_merge()`

The tbl_merge() function is used to merge multiple tables horizontally to compare or combine data.

library(tidyverse)
library(gtsummary)
library(survival)

tbl_resp <- trial |> 
  tbl_uvregression(
    include = c(trt, grade, age),
    y = response, 
    method = glm,
    method.args = list(family = binomial),
    exponentiate = TRUE
) 

tbl_surv <- trial |> 
  tbl_uvregression(
    include = c(trt, grade, age),
    y = Surv(ttdeath, death),
    method = coxph,
    exponentiate = TRUE
  )

tbl_merge(
  list(tbl_resp, tbl_surv),
  tab_spanner = c("**Tumor Response**", "**Time to Death**")
)

Characteristic	Tumor Response				Time to Death
Characteristic	N	OR¹	95% CI¹	p-value	N	HR¹	95% CI¹	p-value
Chemotherapy Treatment	193				200
Drug A		—	—			—	—
Drug B		1.21	0.66, 2.24	0.5		1.25	0.86, 1.81	0.2
Age	183	1.02	1.00, 1.04	0.10	189	1.01	0.99, 1.02	0.3
Grade	193				200
I		—	—			—	—
II		0.95	0.45, 2.00	0.9		1.28	0.80, 2.05	0.3
III		1.10	0.52, 2.29	0.8		1.69	1.07, 2.66	0.024
¹ OR = Odds Ratio, CI = Confidence Interval, HR = Hazard Ratio

Important

covariates are merged automatically
footnotes are merged automatically
column headers are merged automatically

`tbl_stack()` [optional]

The tbl_stack() function is used to stack multiple tables vertically to compare or combine data.

resp_uv <- glm(
    response ~ trt,
    family = binomial,
    data = trial
  ) |> 
  tbl_regression(
    exponentiate = TRUE,
    label = list(trt ~ "Treatment (unadjusted)")
  )

resp_adj <- glm(
    response ~ trt + age + grade + stage,
    family = binomial,
    data = trial
  ) |> 
  tbl_regression(
    include = "trt",
    exponentiate = TRUE,
    label = list(trt ~ "Treatment (adjusted)")
  )

tbl_stack(list(resp_uv, resp_adj))

Characteristic	OR¹	95% CI¹	p-value
Treatment (unadjusted)
Drug A	—	—
Drug B	1.21	0.66, 2.24	0.5
Treatment (adjusted)
Drug A	—	—
Drug B	1.15	0.61, 2.18	0.7
¹ OR = Odds Ratio, CI = Confidence Interval

Tip

you can pass models directly to tbl_regression
you can include only specific variables in the table (but the model can be fit with more variables)

Your turn (main: B; bk1: C; bk2: A)

Your turn (tbl_cross and gtsave only)

Connect to our pad (https://bit.ly/ubep-rws-pad-ed3)
Connect to the Day-3 project in RStudio cloud (https://bit.ly/ubep-rws-rstudio)

Under the section 3.3. Ex23 Ex 24, answer to the questions with an x next to the correct answer.
Then, open the script 16-tbl_uvregression.R and follow the instruction step by step.
Then, open the script 17-merge.R and follow the instruction step by step.

20:00

Important

to formats the digits of the percentages, you can use the digits argument. For example, if the statistic being calculated is "{n} ({p}%)" and you want the percent rounded to 2 decimal places use digits = c(0, 2)
In tbl_uvregression you pass a function and its arguments to the method and method.args arguments, respectively. That model will be fit for each variable included in the table.
In the tbl_regression function, you pass a single model object to the model argument. That model will be summarized in the table.
tbl_merge merges tables horizontally
tbl_stack stacks tables vertically

My turn

YOU: Connect to our pad (https://bit.ly/ubep-rws-pad-ed3) and write there questions & doubts (and if I am too slow or too fast)

ME: Connect to the Course-script project in RStudio cloud (https://bit.ly/ubep-rws-rstudio): script 17-gtsummary.R

Homework

Posit’s RStudio Cloud Workspace

Instructions

Go to: https://bit.ly/ubep-rws-rstudio

Your turn

Project: day-3
Instructions:
Go to: https://bit.ly/ubep-rws-website
The text is the Day-3 assessment under the tab “Summative Assessments”.
(on RStudio Cloud) homework/day_three-summative.html
Script to complete: homework/solution.R

Acknowledgment

To create the current lesson, we explored, used, and adapted content from the following resources:

The slides are made using Posit’s Quarto open-source scientific and technical publishing system powered in R by Yihui Xie’s Kintr.

Additional resources

Daniel D. Sjoberg’s gtsummary FAQ + Gallery.
Daniel D. Sjoberg’s Presentation-Ready Summary Tables with gtsummary YouTube workshop.

License

This work by Corrado Lanera, Ileana Baldi, and Dario Gregori is licensed under CC BY 4.0

Day Four:Summary tables

Overview

Questions

Lesson Objectives

To be able to

Summary tables

What are summary tables?

Descriptive statistic - summarize data

tbl_summary() - base function

tbl_summary() - Main arguments

tbl_summary() - base [side-by-side]

tbl_summary() - strata [side-by-side]

tbl_summary() - variables I [side-by-side]

tbl_summary() - variables II [side-by-side]

tbl_summary() - types [side-by-side]

tbl_summary() - percent [side-by-side]

tbl_summary() - labels [side-by-side]

tbl_summary() - digits [side-by-side]

tbl_summary() - statistics [side-by-side]

Statisitcs overview

tbl_summary() - adders [side-by-side]

tbl_summary() - stylers [side-by-side]

Your turn (main: C; bk1: A; bk2: B)

My turn

Cross-tables - summarize relationships

tbl_cross()

tbl_cross()

gtsave [side-by-side]

Your turn (main: A; bk1: B; bk2: C)

My turn

Break

Tables for univariable models - multiple regression at once [optional]

tbl_uvregression() - base [optional]

tbl_uvregression() - autodetect [optional]

tbl_uvregression() - customize

Merging (h) and stacking (v) tables [optional]

tbl_merge()

tbl_stack() [optional]

Your turn (main: B; bk1: C; bk2: A)

My turn

Homework

Posit’s RStudio Cloud Workspace

Acknowledgment

Additional resources

License

References

Day Four:
Summary tables

`tbl_summary()` - base function

`tbl_summary()` - Main arguments

`tbl_summary()` - base [side-by-side]

`tbl_summary()` - strata [side-by-side]

`tbl_summary()` - variables I [side-by-side]

`tbl_summary()` - variables II [side-by-side]

`tbl_summary()` - types [side-by-side]

`tbl_summary()` - percent [side-by-side]

`tbl_summary()` - labels [side-by-side]

`tbl_summary()` - digits [side-by-side]

`tbl_summary()` - statistics [side-by-side]

`tbl_summary()` - adders [side-by-side]

`tbl_summary()` - stylers [side-by-side]

`tbl_cross()`

`tbl_cross()`

`gtsave` [side-by-side]

`tbl_uvregression()` - base [optional]

`tbl_uvregression()` - autodetect [optional]

`tbl_uvregression()` - customize

`tbl_merge()`

`tbl_stack()` [optional]