Introduction

The functions described here are not expected to be useful in everyday plotting as when using the grammar of graphics one can simply change the order in which layers are added to a ggplot, or remove unused variables from the data before passing it as argument to the ggplot() constructor.

However, if one uses high level methods like autoplot() or other functions that automatically produce a full plot using ‘ggplot2’ internally, one may need to add, move or delete layers so as to profit from such canned methods and retain enough flexibility.

Some time ago I needed to manipulate the layers of a ggplot, and found a matching question in matching question in Stackoverflow. I used the answers found in Stackoverflow as the starting point for writing the functions described in the first part of this vignette.

In a ggplot object, layers reside in a list, and their positions in the list determine the plotting order when generating the graphical output. The grammar of graphics treats the list of layers as a stack using only push operations. In other words, always the most recently added layer resides at the end of the list, and over-plots all layers previously added. The functions described in this vignette allow overriding the normal syntax at the cost of breaking the expectations of the grammar. These functions are, as told above, to be used only in exceptional cases. This notwithstanding, they are rather easy to use. The user interface is consistent across all of them. Moreover, they are designed to return objects that are identical to objects created using the normal syntax rules of the grammar of graphics. The table below list the names and purpose of these functions.

Function Use
delete_layers() delete one or more layers
append_layers() append layers at a specific position
move_layers() move layers to an absolute position
shift_layers() move layers to a relative position
which_layers() obtain the index positions of layers
extract_layers() extract matched or indexed layers
num_layers() obtain number of layers
top_layer() obtain position of top layer
bottom_layer() obtain position of bottom layer

Although their definitions do not rely on code internal to ‘ggplot2’, they rely on the internal structure of objects belonging to class gg and ggplot. Consequently, long-term backwards and forward compatibility cannot be guaranteed, or even expected.

Preliminaries

## For news about 'gginnards', please, see https://www.r4photobiology.info/
## For on-line documentation see https://docs.r4photobiology.info/gginnards/

We generate some artificial data.

We change the default theme to an uncluttered one.

We generate a plot to be used later to demonstrate the use of the functions.

Exploring how ggplots are stored

To display textual information about a gg object we use method summary(), while methods print() and plot() will display the actual plot.

## data: group, panel, y, unused [120x4]
## mapping:  x = ~group, y = ~y
## faceting: <ggproto object: Class FacetWrap, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetWrap, Facet, gg>
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity 
## 
## geom_pointrange: na.rm = FALSE
## stat_summary: fun.data = function (x, mult = 1) 
## {
##     x <- stats::na.omit(x)
##     se <- mult * sqrt(stats::var(x)/length(x))
##     mean <- mean(x)
##     data.frame(y = mean, ymin = mean - se, ymax = mean + se)
## }, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE
## position_identity

Layers in a ggplot object are stored in a list as nameless members. This means that they have to be accessed using numerical indexes, and that we need to use some indirect way of finding the indexes corresponding to the layers of interest.

## NULL

The output of summary() is compact.

##      Length Class         Mode       
## [1,] 11     LayerInstance environment
## [2,] 11     LayerInstance environment

The default print() method for a list of layers displays only a small part of the information in a layer.

## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity 
## 
## [[2]]
## geom_pointrange: na.rm = FALSE
## stat_summary: fun.data = function (x, mult = 1) 
## {
##     x <- stats::na.omit(x)
##     se <- mult * sqrt(stats::var(x)/length(x))
##     mean <- mean(x)
##     data.frame(y = mean, ymin = mean - se, ymax = mean + se)
## }, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE
## position_identity

To see all the fields, we need to use str(), which we use here for a single layer.

## Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
##     aes_params: list
##     compute_aesthetics: function
##     compute_geom_1: function
##     compute_geom_2: function
##     compute_position: function
##     compute_statistic: function
##     data: waiver
##     draw_geom: function
##     finish_statistics: function
##     geom: <ggproto object: Class GeomPoint, Geom, gg>
##         aesthetics: function
##         default_aes: uneval
##         draw_group: function
##         draw_key: function
##         draw_layer: function
##         draw_panel: function
##         extra_params: na.rm
##         handle_na: function
##         non_missing_aes: size shape colour
##         optional_aes: 
##         parameters: function
##         required_aes: x y
##         setup_data: function
##         use_defaults: function
##         super:  <ggproto object: Class Geom, gg>
##     geom_params: list
##     inherit.aes: TRUE
##     layer_data: function
##     map_statistic: function
##     mapping: NULL
##     position: <ggproto object: Class PositionIdentity, Position, gg>
##         compute_layer: function
##         compute_panel: function
##         required_aes: 
##         setup_data: function
##         setup_params: function
##         super:  <ggproto object: Class Position, gg>
##     print: function
##     show.legend: NA
##     stat: <ggproto object: Class StatIdentity, Stat, gg>
##         aesthetics: function
##         compute_group: function
##         compute_layer: function
##         compute_panel: function
##         default_aes: uneval
##         extra_params: na.rm
##         finish_layer: function
##         non_missing_aes: 
##         parameters: function
##         required_aes: 
##         retransform: TRUE
##         setup_data: function
##         setup_params: function
##         super:  <ggproto object: Class Stat, gg>
##     stat_params: list
##     super:  <ggproto object: Class Layer, gg>

Manipulation of plot layers

We start by using which_layers() as it produces simply a vector of indexes into the list of layers. The third statement is useless here, but demonstrates how layers are selected in all the functions described in this document.

which_layers(p, "GeomPoint")
## [1] 1
which_layers(p, "StatSummary")
## [1] 2
which_layers(p, idx = 1L)
## [1] 1

We can also easily extract matching layers with extract_layers(). Here one layer is returned, and displayed using the default print() method. Method str() can also be used as shown above.

extract_layers(p, "GeomPoint")
## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

With delete_layers() we can remove layers from a plot, selecting them using the match to a class, as shown here, or by a positional index. This was shown above for which_layers().

delete_layers(p, "GeomPoint")

delete_layers(p, idx = 1L)

delete_layers(p, "StatSummary")

With move_layers() we can alter the stacking order of layers. The layers to move are selected in the same way as in the examples above, while position gives where to move the layers to. Two character strings, "top" and "bottom" are accepted as position argument, as well as integers. In the later case, the layer(s) is/are appended after the supplied position with reference to the list of layers not being moved.

move_layers(p, "GeomPoint", position = "top")

The equivalent operation using a relative position. A positive value for shift is interpreted as an upward displacement and a negative one as downwards displacement.

shift_layers(p, "GeomPoint", shift = +1)

Here we show how to add a layer behind all other layers.

append_layers(p, geom_line(colour = "orange", size = 1), position = "bottom")

It is also possible to append the new layer immediately above an arbitrary existing layer using a numeric index, which as shown here can be also obtained by matching to a class name. In this example we insert a new layer in-between two layers already present in the plot. As with the + operator of the Grammar of Graphics, object also accepts a list of layers as argument (no example shown).

append_layers(p, object = geom_line(colour = "orange", size = 1), 
              position = which_layers(p, "GeomPoint"))

Annotations add layers, so they can be manipulated in the same way as other layers.

delete_layers(p1, "GeomText")

Other manipulations

We very briefly mention in this section how to manipulate other elements of of ggplot objects. The output from the examples in this section is not included is this vignette.

For replacing other elements in ggplot objects we can in general use methods defined in ‘ggplot2’.

Replacing scales, coordinates, whole themes and data.

Other elements that are normally added and replaced in plot with operator +, such as scales. To the default data stored in a ggplot object operator %+% is used.

Editing theme elements

Method summary() is available for themes.

However, to see the actual values stored, we need to use str().

Themes can be modified using theme(). See the ‘ggplot2’ documentation for details.

Removing unused data

The argument passed through data to ggplot() or a layer is stored in whole in the ggplot object, even the data columns not mapped to any aesthetic. In most cases this does not matter, but in the case of huge datasets, the use of RAM can add up, and occasionally printing of each plot can slow down. The reason for storing the whole data set is that it is always possible to add layers and consequently only the user can know which variables can be removed or not.

One obvious way of not storing unused data in ggplot objects is for the user to select the required variables and pass only these to the ggplot() constructor or layers. A less efficient alternative, but possibly easier to use for some users, is for users to drop the unused variables when they consider that a plot is ready. We show here how to do this, with a function that started as a self-imposed exercise.

To simplify the embedded data objects we need to find which variables are mapped to aesthetics and which are not. Here is a naive attempt at handling the possibility of mappings to expressions involving computations and multiple variables per mapping, and facets. This is naive in that it ignores mapping within layers and variables used for faceting.

We need also to find which variables are present in the data.

Next we identify which variables in data are not used, and delete them.

For a data set this small, removing a single column saves very little space.

## 5088 bytes
## 11384 bytes
## 10352 bytes
## [1] "group"  "panel"  "y"      "unused"
## [1] "group"  "panel"  "y"      "unused"
## [1] "group" "panel" "y"

The plot has not changed.

We can assemble all the code into a function for convenience, and expand the code to also recognize mappings within layers and variables used in faceting. Such a
function, only cursorily tested is included in the package as drop_vars(). Given its design the most likely failure mode is keeping too many variables rather than removing too many.

It is yet not clear to me when R does make a copy of the data embedded in a ggplot object and when not. R’s policy is to copy data objects lazily, or only when modified. Does the ‘ggplot2’ code modify the argument passed to its data parameter triggering a real copy operation or not?

In any case when saving ggplot objects to disk avoiding to carry along unused data can be beneficial. Of course, removing unused data means that they will not be available at a later time if we want to add more layers to the same saved ggplot object.