  • What is a ggplot, and what are its main components?
  • How should data be provided to a ggplot?
  • How can we create ggplots with the ggplot2 R package?
  • What are aesthetics, geometries, facets, plot themes and labs?

  • Create basic plots with ggplot2.
  • Use the facet_* functions to stratify plots according to data.
  • Modify the main style component of a plot (i.e., sub-/titles, labels, legends)
  • Save a plot as a stand-alone image file.


The R Layered Grammar of Graphics

Preamble: Pipes in composing plots

In the next section we will learn how to create plots with ggplot2.

We will create plots progressively adding what we will call layers of the plot.

For ggplot2 plots composition only, we have a dedicated pipe that is the plus sign +, reminding that we are adding elements.


Functions in ggplot2 are nouns and not verbs, exactly because we (sequentially) add them to the plot we are creating!1


First of all setup our environment for this lesson, and load some data.

The Data

On November 14th 2006 the director of a high school in Greater Copenhagen, Denmark, contacted the regional public health authorities to inform them about an outbreak of diarrhoea and vomiting among participants from a school dinner party held on the 11th of November 2006. Almost all students and teachers of the school (750 people) attended the party.2


linelist <- here("data-raw/Copenhagen_clean.xlsx") |> 
  import() |> 
  mutate(across(where(is.character), fct))

head(linelist) # for slides, first 6 obs only.

Definitions:1 Tidy data.

  • A variable is a quantity, quality, or property that you can measure.

  • A value is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

  • An observation is a set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation will contain several values, each associated with a different variable. We’ll sometimes refer to an observation as a data point.

Tabular data is a set of values, each associated with a variable and an observation.

In the next lessons, we will focus more on this, including how to convert non tidy dataset in tidy ones!


Tabular data is tidy if:

  • Each value is placed in its own “cell”.
  • Each variable in its own column.
  • Each observation in its own row.

Why a Layered Grammar for Graphics

Using the ggplot2 system to create graphs, we won’t need to learn all the commands to produce every plot but we can learn a single system, a grammar, that will make us able to produce quite every kind of graph.

ggplot2 will allow us to build graphs by:

  1. plot information in our data

  2. mapping each of them to the aesthetics of our choice (e.g., x, y, colors)

  3. using the geometrical representation we need (e.g., points, lines, bars)

  4. after having possibly transformed them by some statistics

  5. accordingly to possibly different coordinate systems (e.g., polar)

  6. maybe stratifying the plot for some information in the data itself

  7. and customize its theme with regard to our stylistic needs and metadata (e.g., title, labels, …)


By learning the grammar to control these 7 components, we can build quite any kind of graph using quite any kind of customization.


We will rarely need to use all these components. In this course, we will provide the basis for 1-3 (required to _have_ a plot), 6, and 7, while we will only mention at 4 and 5.

The practical aim of the lesson

ggplot plots components (side-by-side)

1. Data

Each part of the plot will be build using a single variable in our data, so that we can build the plot up the data we have, and, on the other side, we can control any part of the plot by our data.



All the ggplot2 plots start from tabular data, calling ggplot on them.


Calling ggplot on data provide a white canvas to start building the plot.

1. Data

Each part of the plot will be build using a single variable in our data, so that we can build the plot up the data we have, and, on the other side, we can control any part of the plot by our data.

linelist |>  # start from data, and than...
  ggplot()  # create a plot


All the ggplot2 plots start from tabular data, calling ggplot on them.


Calling ggplot on data provide a white canvas to start building the plot.

2. Aesthetics

Let’s say we want to investigate the distribution of the onset time. We should map the onset_datetime variable to the x axis!

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime


The aes function maps variables to aesthetics of our plot.

Main aesthetics (overview)


You use aesthetics for visualize the data.

  • x, y: position along the x and y axes.

  • alpha: the transparency of the geometries.

  • colour: the color of the geometries according to the data.

  • fill: the interior color of the geometries.

  • group: to which group a geometry belongs.

  • linetype: the type of line used (solid, dotted, etc.).

  • shape: the shape of the points.

  • size: the size of the points or lines.

3. Geometries

Once having the canvas and the mappings, we can add a geometrical layer. In this case, we what to add bars for onset_datetime (i.e., x).

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar() # drawing bars


In the help description of each geom_* there are the required aesthetics that it needs to be used.


All geometry functions are called geom_*, with * indicating the type of geometry:

?geom_point, ?geom_line, ?geom_bar, ?geom_boxplot, ?gome_histogram, …

Main geom_*etries (overview)


You use geom_*etries for shape the data.

  • geom_point: scatter-plot

  • geom_line: lines connecting points

  • geom_smooth: function line based on data

  • geom_boxplot: box plot for categorical variables

  • geom_bar: bar charts for categorical x axis

  • geom_histogram: histogram for continuous x axis

  • geom_violin: distribution kernel of data dispersion

  • geom_path: lines connecting points in sequence of appearance

aesthetic mapping vs aesthetics parameters?

Suppose we would like to have the bars filled in blue.

Why this produces red bars, and the legend report “fill” as header and “blue” as level?

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime,
      fill = "blue"
  ) + 

aesthetic mapping vs aesthetics parameters!

Suppose we would like to have the bars filled in blue.

Why this produces blue bars, and there is no legend?

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue")


For having a blue bar chart, put the parameter within the geom_*etry call, and out of the aes call: these are parameters used to set aesthetics to a fixed value, like colour = "red" or size = 3, instead of mapping data to the aesthetics!

Multiple geom_*etries

We can also add multiple geom_*etries one on top of the others. In which case, it could be useful to set personalized aesthetics and customized the position of the geoms.

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"


We can set also aesthetics within a single geom_* without affecting the other.


We would like to also set the position of the geom_* we are creating.

Multiple geom_*etries

We can also add multiple geom_*etries one on top of the others. In which case, it could be useful to set personalized aesthetics and customized the position of the geoms.

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
    aes(fill = sex),
    position = "dodge"
  geom_bar(fill = "blue")


geom_*s are added in order, so the operation is NOT commutative!

Main positions (overview)


You use positions for place the geom_*s.

  • "stack": (default) multiple bars occupying the same x position will be stacked atop one another.

  • "dodge": dodged side-to-side.

  • "fill": shows relative proportions at each x by stacking the bars and then standardizing each bar to have the same height.

  • "jitter": adds random noise to a plot making it easier to read, sometimes.

Base template

Up to now, we can have a minimal set of instructions to define a base template for our plots.

<DATA> |> 
  ) + 

Base template (+ optionals)

Up to now, we can have a minimal set of instructions to define a base template for our plots.

<DATA> |> 
  ) + 
      position = <LOCAL_POSITION>, # optional
      <AESTHETIC> = <LOCAL_CONSTANT> # optional


Local aesthetic mappings overwrite the global ones!

Scales (overview)

Scales control the mapping from data to aesthetics. They are required to have a plot, but they are often set automatically. We can customize them to have a better control on the plot.

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"
  ) + 
    date_breaks = "12 hours", 
    labels = scales::label_date_short()
  ) +
    breaks = scales::breaks_pretty()


Scales names are composed as scale_<aes>_<type>, where <aes> is the aesthetic, and <type> is the type of scale.

see ?scale_y_continuous, and ?scale_x_datetime.


Package scales provides a set of functions to customize scales. We used label_date_short and breaks_pretty to have a better control on the labels and breaks.

Facets (overview)

We can then stratify our plot by the levels of one or two discrete data in our data set, creating distinct plot with the data for each class, displayed in distinct facets..

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"
  ) + 
    date_breaks = "12 hours", 
    labels = scales::label_date_short()
  ) +
    breaks = scales::breaks_pretty()
  ) +
    group ~ class


facet_grid forms a matrix of panels defined by row and column faceting variables.


facet_grid is most useful when you have two discrete variables, and all combinations of the variables exist in the data. If you have only one variable with many levels, try ?facet_wrap.

Facets (overview)

We can then stratify our plot by the levels of one or two discrete data in our data set, creating distinct plot with the data for each class, displayed in distinct facets..

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"
  ) + 
    date_breaks = "12 hours", 
    labels = scales::label_date_short()
  ) +
    breaks = scales::breaks_pretty()
  ) +
    group ~ class,
    scales = "free_y",
    labeller = "label_both"


  • scales: are scales shared across all facets (the default, “fixed”), or do they vary across rows (“free_x”), columns (“free_y”), or both rows and columns (“free”)?

  • labeller default labeller (i.e., "label_value") labels the rows and columns with their names; "label_both" displays both the variable name and the factor value.

Customize metadata: primary labels

Now we can start to make it nicer, adding and improving some text and label, as the title, axis and legend labels, and a caption.

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"
  ) + 
    date_breaks = "12 hours", 
    labels = scales::label_date_short()
  ) +
    breaks = scales::breaks_pretty()
  ) +
    group ~ class,
    scales = "free_y"
  ) + 
    ## aesthetics used titles
    x = "Onset date",
    y = "Count (N person)",
    fill = "Sex",
    ## plot metadata
    title = "Distribution of cases across days.",
    subtitle = "Stratified by group and class.",
    caption = "Data from ECDC EPIET Outbreak Investigation ("

Theme (overview)

Many other options we can finally consider to fine tune the appearance of our plot.

linelist |>  # start from data, and than 
  ggplot(  # create a plot
    aes(  # with aesthetics:
      x = onset_datetime
  ) + 
  geom_bar(fill = "blue") +
    aes(fill = sex),
    position = "dodge"
  ) + 
    date_breaks = "12 hours", 
    labels = scales::label_date_short()
  ) +
    breaks = scales::breaks_pretty()
  ) +
    group ~ class,
    scales = "free_y",
    labeller = "label_both"
  ) + 
    ## aesthetics used titles
    x = "Onset date",
    y = "Count (N person)",
    fill = "Sex",
    ## plot metadata
    title = "Distribution of cases across days.",
    subtitle = "Stratified by group and class.",
    caption = "Data from ECDC EPIET Outbreak Investigation ("
  ) +
  theme_bw() +
    legend.position = "top"

Themes: showcase (overview)

Theme custom parameters are quite much, here we report a representation of a number of them.

Theme Elements Reference Sheet by Isabella Benabaye

A more complete template

We can finally have a bigger set of instructions to define a more exhaustive template for our plots.

<DATA> |> 
  ) + 
      position = <LOCAL_POSITION>,
    ) +
      ## aesthetics
      <AES_NAME> = "<TEXT>",
      ## meta-data
      <METADATA_NAME> = "<TEXT>"
    ) +

Saving plots

To save a ggplot on your disk, you can call the function ggsave. Many kind of output are supported.

epicurve <- linelist |> 
  ggplot(aes(...)) + 
  geom_bar(...) + 
  scales_x_datetime(...) +
  scales_y_continuous(...) +
  facet_grid(...) + 
  labs(...) + 
ggsave("epicurve.pdf", plot = epicurve)
ggsave("epicurve.png", plot = epicurve)
ggsave("epicurve.jpeg", plot = epicurve)
ggsave("epicurve.tiff", plot = epicurve)
ggsave("epicurve.bmp", plot = epicurve)
ggsave("epicurve.svg", plot = epicurve)
ggsave("epicurve.eps", plot = epicurve)
ggsave("", plot = epicurve)
ggsave("epicurve.tex", plot = epicurve)


  • The plot argument of ggsave is optional, if not specified the last plot created and displayed is saved!

  • The ggsave function guesses the type of graphics device from the extension of the filename.


