stat_summary_xy()
and stat_centroid()
are similar to
ggplot2::stat_summary()
but summarize both x
and y
values in the same plot layer. Differently to stat_summary()
no
grouping based on data values
is done; the grouping respected is that
already present based on mappings to aesthetics. This makes it possible to
highlight the actual location of the centroid with geom_point()
,
geom_text()
, and similar geometries. Instead, if we use
geom_rug()
they are only a convenience avoiding the need to add two
separate layers and flipping one of them using orientation = "y"
.
Usage
stat_apply_group(
mapping = NULL,
data = NULL,
geom = "line",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_summary_xy(
mapping = NULL,
data = NULL,
geom = "point",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_centroid(
mapping = NULL,
data = NULL,
geom = "point",
.fun = NULL,
.fun.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
Arguments
- mapping
The aesthetic mapping, usually constructed with
aes
. Only needs to be set at the layer level if you are overriding the plot defaults.- data
A layer specific dataset - only needed if you want to override the plot defaults.
- geom
The geometric object to use display the data
- .fun.x, .fun.y, .fun
function to be applied or the name of the function to be applied as a character string.
- .fun.x.args, .fun.y.args, .fun.args
additional arguments to be passed to the function as a named list.
- position
The position adjustment to use for overlapping points on this layer
- na.rm
a logical value indicating whether NA values should be stripped before the computation proceeds.
- show.legend
logical. Should this layer be included in the legends?
NA
, the default, includes if any aesthetics are mapped.FALSE
never includes, andTRUE
always includes.- inherit.aes
If
FALSE
, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g.borders
.- ...
other arguments passed on to
layer
. This can include aesthetics whose values you want to set, not map. Seelayer
for more details.
Value
A data frame with the same variables as the data input, with either a
single or multiple rows, with the values of x
and y
variables
replaced by the values returned by the applied functions, or possibly
filled with NA
if no function was supplied or available by default.
If the applied function returns a named vector, the names are copied into
columns x.names
and/or y.names
. If the summary function
applied returns a one row data frame, it will be column bound keeping
the column names, but overwritting columns x and/or y with y from the
summary data frame. In the names returned by .fun.x
the letter
"y" is replaced by "x". These allows the use of the same functions as in
- x
x-value as returned by
.fun.x
, with names removed- y
y-value as returned by
.fun.y
, with names removed- x.names
if the x-value returned by
.fun.x
is named, these names- y.names
if the y-value returned by
.fun.y
is named, these names- xmin, xmax
values returned by
.fun.x
under these names, if present- ymin, ymax
values returned by
.fun.y
under these names, if present- <other>
additional values as returned by
.fun.y
under other names
Details
stat_apply_group
applies functions to data.
When possible it is preferable to use transformations through scales or
summary functions such as ggplot2::stat_summary()
,
stat_summary_xy()
or stat_centroid()
. There are some
computations that are not scale transformations but are not usual summaries
either, as the number of data values does not decrease all the way to one row
per group. A typical case for a summary is the computation of quantiles. For
transformations are cumulative ones, e.g., using cumsum()
,
runmed()
and similar functions. Obviously, it is always possible to
apply such functions to the data before plotting and passing them to a single
layer function. However, it can be useful to apply such functions on-the-fly
to ensure that grouping is consistent between computations and aesthetics.
One particularity of these statistics is that they can apply simultaneously
different functions to x
values and to y
values when needed. In
contrast to these statistics, geom_smooth
applies a
function that takes both x
and y
values as arguments.
These four statistics are similar. They differ on whether they return a single or multiple rows of data per group.
Note
The applied function(s) must accept as first argument a vector that
matches the variables mapped to x
or y
aesthetics. For
stat_summary_xy()
and stat_centroid()
the function(s) to be
applied is(are) expected to return a vector of length 1 or a data frame
with only one row, as mean_se()
, mean_cl_normal()
mean_cl_boot()
, mean_sdl()
and median_hilow()
from
'ggplot2' do.
For stat_apply_group
the vectors returned by the
the functions applied to x
and y
must be of exactly the same
length. When only one of .fun.x
or .fun.y
are passed a
function as argument, the other variable in the returned data is filled
with NA_real_
. If other values are desired, they can be set by means
of a user-defined function.
References
Answers to question "R ggplot on-the-fly calculation by grouping variable" at https://stackoverflow.com/questions/51412522.
Examples
set.seed(123456)
my.df <- data.frame(X = rep(1:20,2),
Y = runif(40),
category = rep(c("A","B"), each = 20))
# make sure rows are ordered for X as we will use functions that rely on this
my.df <- my.df[order(my.df[["X"]]), ]
# Centroid
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(shape = "cross", size = 6) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "rug", linewidth = 1.5, .fun = median) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "text", aes(label = category)) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(geom = "pointrange",
.fun.x = mean, .fun.y = mean_se) +
geom_point()
# quantiles
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "rug", .fun.y = quantile, .fun.x = quantile)
ggplot(my.df, aes(x = X, y = Y)) +
geom_point() +
stat_apply_group(geom = "rug", sides = "lr", color = "darkred",
.fun.y = quantile) +
stat_apply_group(geom = "text", hjust = "right", color = "darkred",
.fun.y = quantile,
.fun.x = function(x) {rep(22, 5)}, # set x to 22
mapping = aes(label = after_stat(y.names))) +
expand_limits(x = 21)
my.probs <- c(0.25, 0.5, 0.75)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "hline",
aes(yintercept = after_stat(y)),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
# cummulative summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = cummax)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = cumsum, .fun.y = cumsum)
# diff returns a shorter vector by 1 for each group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x[-1L]},
.fun.y = diff, na.rm = TRUE)
# Running summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = runmed, .fun.y.args = list(k = 5))
# Rescaling per group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = function(x) {(x - min(x)) / (max(x) - min(x))})
# inspecting the returned data
if (requireNamespace("gginnards", quietly = TRUE)) {
library(gginnards)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(.fun = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(.fun.y = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.y = cumsum, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "debug",
.fun.x = quantile,
.fun.x.args = list(probs = my.probs),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
}
#> [1] "PANEL 1; group(s) 1, 2; 'draw_function()' input 'data' (head):"
#> colour x y PANEL group x.names y.names
#> 1 #F8766D 5.75 0.3399159 1 1 25% 25%
#> 2 #F8766D 10.50 0.6736796 1 1 50% 50%
#> 3 #F8766D 15.25 0.8791947 1 1 75% 75%
#> 4 #00BFC4 5.75 0.1643890 1 2 25% 25%
#> 5 #00BFC4 10.50 0.5641866 1 2 50% 50%
#> 6 #00BFC4 15.25 0.8576917 1 2 75% 75%