User Guide: 2 Manipulating ggplots
‘gginnards’ 0.2.0.1
Pedro J. Aphalo
2024-11-14
Source:vignettes/user-guide-2.Rmd
user-guide-2.Rmd
Introduction
The functions described here are not expected to be useful in
everyday plotting as when using the grammar of graphics one can
simply change the order in which layers are added to a ggplot, or remove
unused variables from the data before passing it as argument to the
ggplot()
constructor.
However, if one uses high level methods like autoplot()
or other functions that automatically produce a full plot using
‘ggplot2’ internally, one may need to add, move or delete layers so as
to profit from such canned methods and retain enough flexibility.
Some time ago I needed to manipulate the layers of a
ggplot
, and found a matching
question in Stackoverflow. I used the answers found in Stackoverflow
as the starting point for writing the functions described in the first
part of this vignette.
In a ggplot
object, layers reside in a list, and their
positions in the list determine the plotting order when generating the
graphical output. The grammar of graphics treats the list of
layers as a stack using only push operations. In other
words, always the most recently added layer resides at the end of the
list, and during rendering over-plots all layers previously added. The
functions described in this vignette allow overriding the
normal syntax at the cost of breaking the expectations
of the grammar. These functions are, as told above, to be used only in
exceptional cases. This notwithstanding, they are rather easy to use and
the user interface is consistent across all of them. Moreover, they are
designed to return objects that are identical to objects created using
the normal syntax rules of the grammar of graphics. The table
below list the names and purpose of these functions.
Function | Use |
---|---|
delete_layers() |
delete one or more layers |
append_layers() |
append layers at a specific position |
move_layers() |
move layers to an absolute position |
shift_layers() |
move layers to a relative position |
which_layers() |
obtain the index positions of layers |
extract_layers() |
extract matched or indexed layers |
num_layers() |
obtain number of layers |
top_layer() |
obtain position of top layer |
bottom_layer() |
obtain position of bottom layer |
Although their definitions do not rely on code internal to ‘ggplot2’,
they rely on the internal structure of objects belonging to class
gg
and ggplot
. Consequently, long-term
backwards and forward compatibility cannot be guaranteed, or even
expected.
Preliminaries
library(ggplot2)
library(gginnards)
library(tibble)
library(magrittr)
library(stringr)
eval_pryr <- requireNamespace("pryr", quietly = TRUE)
We generate some artificial data and create a data frame with them.
set.seed(4321)
# generate artificial data
my.data <- data.frame(
group = factor(rep(letters[1:4], each = 30)),
panel = factor(rep(LETTERS[1:2], each = 60)),
y = rnorm(40),
unused = "garbage"
)
We add attributes to the data frame with the fake data.
We change the default theme to an uncluttered one.
We generate a plot to be used later to demonstrate the use of the
functions. We ue expand_limits()
to ensure that the effect
of later manipulations is easier to notice.
p <- ggplot(my.data, aes(group, y)) +
geom_point() +
stat_summary(fun.data = mean_se, colour = "cornflowerblue", size = 1) +
facet_wrap(~panel, scales = "free_x", labeller = label_both) +
expand_limits(y = c(-2, 2))
p
Exploring how ggplots are stored
To display summary textual information about a gg
object
we use method summary()
from package ‘ggplot2’, while
methods print()
and plot()
will display the
actual plot.
summary(p)
## data: group, panel, y, unused [120x4]
## mapping: x = ~group, y = ~y
## faceting: ~panel
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
##
## geom_pointrange: na.rm = FALSE, orientation = NA
## stat_summary: fun.data = function (x, mult = 1)
## {
## x <- stats::na.omit(x)
## se <- mult * sqrt(stats::var(x)/length(x))
## mean <- mean(x)
## data_frame0(y = mean, ymin = mean - se, ymax = mean + se, .size = 1)
## }, fun = NULL, fun.max = NULL, fun.min = NULL, fun.args = list(), na.rm = FALSE, orientation = NA
## position_identity
##
## mapping: y = ~y
## geom_blank: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
Layers in a ggplot object are stored in a list as nameless members. This means that they have to be accessed using numerical indexes, and that we need to use some indirect way of finding the indexes corresponding to the layers of interest.
names(p$layers)
## [1] "geom_point" "stat_summary" "geom_blank"
The output of summary()
is compact.
summary(p$layers)
## Length Class Mode
## geom_point 16 LayerInstance environment
## stat_summary 16 LayerInstance environment
## geom_blank 16 LayerInstance environment
The default print()
method for a list of layers displays
only a small part of the information in a layer.
print(p$layers)
## $geom_point
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
##
## $stat_summary
## geom_pointrange: na.rm = FALSE, orientation = NA
## stat_summary: fun.data = function (x, mult = 1)
## {
## x <- stats::na.omit(x)
## se <- mult * sqrt(stats::var(x)/length(x))
## mean <- mean(x)
## data_frame0(y = mean, ymin = mean - se, ymax = mean + se, .size = 1)
## }, fun = NULL, fun.max = NULL, fun.min = NULL, fun.args = list(), na.rm = FALSE, orientation = NA
## position_identity
##
## $geom_blank
## mapping: y = ~y
## geom_blank: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
To see all the fields, we need to use str()
, which we
use here for a single layer.
str(p$layers[[1]])
## Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
## aes_params: list
## compute_aesthetics: function
## compute_geom_1: function
## compute_geom_2: function
## compute_position: function
## compute_statistic: function
## computed_geom_params: list
## computed_mapping: uneval
## computed_stat_params: list
## constructor: call
## data: waiver
## draw_geom: function
## finish_statistics: function
## geom: <ggproto object: Class GeomPoint, Geom, gg>
## aesthetics: function
## default_aes: uneval
## draw_group: function
## draw_key: function
## draw_layer: function
## draw_panel: function
## extra_params: na.rm
## handle_na: function
## non_missing_aes: size shape colour
## optional_aes:
## parameters: function
## rename_size: FALSE
## required_aes: x y
## setup_data: function
## setup_params: function
## use_defaults: function
## super: <ggproto object: Class Geom, gg>
## geom_params: list
## inherit.aes: TRUE
## layer_data: function
## map_statistic: function
## mapping: NULL
## name: NULL
## position: <ggproto object: Class PositionIdentity, Position, gg>
## compute_layer: function
## compute_panel: function
## required_aes:
## setup_data: function
## setup_params: function
## super: <ggproto object: Class Position, gg>
## print: function
## setup_layer: function
## show.legend: NA
## stat: <ggproto object: Class StatIdentity, Stat, gg>
## aesthetics: function
## compute_group: function
## compute_layer: function
## compute_panel: function
## default_aes: uneval
## dropped_aes:
## extra_params: na.rm
## finish_layer: function
## non_missing_aes:
## optional_aes:
## parameters: function
## required_aes:
## retransform: TRUE
## setup_data: function
## setup_params: function
## super: <ggproto object: Class Stat, gg>
## stat_params: list
## super: <ggproto object: Class Layer, gg>
Manipulation of plot layers
We start by using which_layers()
as it produces simply a
vector of indexes into the list of layers. The third statement is
useless here, but demonstrates how layers are selected in all the
functions described in this document. We can see that each layer, as
described in the first volume of this User Guide, contains one geometry
and one statistic.
which_layers(p, "GeomPoint")
## geom_point
## 1
which_layers(p, "StatIdentity")
## geom_point geom_blank
## 1 3
which_layers(p, "GeomPointrange")
## stat_summary
## 2
which_layers(p, "StatSummary")
## stat_summary
## 2
which_layers(p, idx = 1L)
## [1] 1
We can also easily extract matching layers with
extract_layers()
. Here one layer is returned, and displayed
using the default print()
method. Method str()
can also be used as shown above.
extract_layers(p, "GeomPoint")
## $geom_point
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
With delete_layers()
we can remove layers from a plot,
selecting them using the match to a class, as shown here, or by a
positional index as shown next.
delete_layers(p, "GeomPoint")
delete_layers(p, idx = 1L)
delete_layers(p, "StatSummary")
With move_layers()
we can alter the stacking order of
layers. The layers to move are selected in the same way as in the
examples above, while position
gives where to move the
layers to. Two character strings, "top"
and
"bottom"
are accepted as position
argument, as
well as integer
s. In the later case, the layer(s) is/are
appended after the supplied position with reference to the list of
layers not being moved.
move_layers(p, "GeomPoint", position = "top")
The equivalent operation using a relative position. A positive value
for shift
is interpreted as an upward displacement and a
negative one as downwards displacement.
shift_layers(p, "GeomPoint", shift = +1)
Here we show how to add a layer behind all other layers.
append_layers(p, geom_line(colour = "orange", size = 1), position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
It is also possible to append the new layer immediately above an
arbitrary existing layer using a numeric index, which as shown here can
be also obtained by matching to a class name. In this example we insert
a new layer in-between two layers already present in the plot. As with
the +
operator of the Grammar of Graphics,
object
also accepts a list of layers as argument (no
example shown).
append_layers(p, object = geom_line(colour = "orange", size = 1),
position = which_layers(p, "GeomPoint"))
Annotations add layers, so they can be manipulated in the same way as other layers.
p1 <- p +
annotate("text", label = "text label", x = 1.1, y = 0, hjust = 0)
p1
delete_layers(p1, "GeomText")
Replacing scales, coordinates, whole themes and data.
Elements that are normally added to a ggplot with operator
+
, such as scales, themes, aesthetics can be replaced with
the %+%
operator. The situation with layers is different as
a plot may contain multiple layers and layers are nameless. With layers
%+%
is not a replacement operator.
num_layers(p)
## [1] 3
num_layers(p %+% geom_point(colour = "blue"))
## [1] 4
num_layers(p + geom_point(colour = "blue"))
## [1] 4
p1 <- p + theme_bw()
p1
p1 + theme_void()
p1 %+% theme_void()
Editing theme elements
Method summary()
is available for themes.
However, to see the actual values stored, we need to use
str()
. To avoid excessive output we first find the names
for the elements of the theme and then look as how the default text
settings are stored.
Themes can be modified using theme()
. See the ‘ggplot2’
documentation for details.
Removing unused data
The argument passed through data
to
ggplot()
or a layer is stored in whole in the
ggplot
object, even the data columns not mapped to any
aesthetic. In most cases this does not matter, but in the case of huge
datasets, the use of RAM and disk space can add up, and occasionally
printing of each plot can slow down. The reason for storing the whole
data set is that it is always possible to add layers with the grammar of
graphics to an existing plot and consequently only the user can know
which variables can be removed or not.
One obvious way of not storing unused data in ggplot
objects is for the user to select the required variables and pass only
these to the ggplot()
constructor or layers. A less
efficient alternative, but possibly easier to use for some users, is for
users to drop the unused variables when they consider that a plot is
ready. We show here how to do this, with a function that started as a
self-imposed exercise.
To simplify the embedded data objects we need to find which variables are mapped to aesthetics and which are not. Here is a naive attempt at handling the possibility of mappings to expressions involving computations and multiple variables per mapping, and facets. This is naive in that it ignores mapping within layers and variables used for faceting.
mapped.vars <-
gsub("[~*\\%^]", " ", as.character(p$mapping)) %>%
str_split(boundary("word")) %>%
unlist() %>%
c(names(p$facet$params$facets))
We need also to find which variables are present in the data.
data.vars <- names(p$data)
Next we identify which variables in data
are not used,
and delete them.
p1 <- p
p1$data <- p$data[ , keep.idxs]
For a data set this small, removing a single column saves very little space.
object.size(my.data)
## 5576 bytes
object.size(p)
## 13168 bytes
object.size(p1)
## 11648 bytes
names(my.data)
## [1] "group" "panel" "y" "unused"
names(p$data)
## [1] "group" "panel" "y" "unused"
names(p1$data)
## [1] "group" "panel" "y"
The plot has not changed.
p1
We can assemble all the code into a function for convenience, and
expand the code to also recognize mappings within layers and variables
used in faceting. Such a function, only cursorily tested is included in
the package as drop_vars()
. Given its design the most
likely failure mode is keeping too many variables rather than removing
too many.
drop_vars(p)
When saving ggplot
objects to disk avoiding to carry
along unused data can be beneficial. Of course, removing unused data
means that they will not be available at a later time if we want to add
more layers to the same saved ggplot object.
It was not clear to me when R does make a copy of the data embedded
in a ggplot
object and when not. R’s policy is to copy data
objects lazily, or only when modified. Does the ‘ggplot2’ code modify
the argument passed to its data
parameter triggering a real
copy operation or not. We can check this with the help of package
‘pryr’.
pryr::address(my.data)
## [1] "0x2500afa7558"
z <- p$data
pryr::address(z)
## [1] "0x2500afa7558"
In this case, R has not created a copy. So, from the point of view of
total memory usage, deleting the unused columns in p
is not
always beneficial. If the object is saved to disk or
my.data
modified in any way after p
was
created a copy of my.data
will be created at this later
time. In this simple example we modify the value of an attribute.
## [1] "0x2500afa7558"
pryr::address(my.data)
## [1] "0x250103051e8"
Attributes of the embedded data object
‘ggplot2’ version 3.1.0 and later preserves most attributes of the
object passed as argument to the data parameter of the
ggplot()
constructor. The class of the object seems to be
modified if it is derived from data frame or tibble, but other
attributes are retained in the copy stored in the gg
object.
## $names
## [1] "group" "panel" "y" "unused"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120
##
## $my.atr.char
## [1] "my.atr.value"
##
## $my.atr.num
## [1] 12345678
Another interesting question is whether these user attributes are
copied when data are passed to geometries and statistics. We can find
out with geom_debug_panel()
that they are not.
p + geom_debug_panel(dbgfun.data = attributes, dbgfun.params = NULL)
## [1] "PANEL 1; group(s) 1, 2; 'draw_panel()' input 'data' (.Primitive(\"attributes\")):"
## $names
## [1] "x" "y" "PANEL" "group"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60
##
## $class
## [1] "data.frame"
##
## [1] "PANEL 2; group(s) 3, 4; 'draw_panel()' input 'data' (.Primitive(\"attributes\")):"
## $names
## [1] "x" "y" "PANEL" "group"
##
## $row.names
## [1] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
## [20] 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
## [39] 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## [58] 118 119 120
##
## $class
## [1] "data.frame"