`stat_summary_xy()`

and `stat_centroid()`

are similar to
`ggplot2::stat_summary()`

but summarize both `x`

and `y`

values in the same plot layer. Differently to `stat_summary()`

no
grouping based on data `values`

is done; the grouping respected is that
already present based on mappings to aesthetics. This makes it possible to
highlight the actual location of the centroid with `geom_point()`

,
`geom_text()`

, and similar geometries. Instead, if we use
`geom_rug()`

they are only a convenience avoiding the need to add two
separate layers and flipping one of them using `orientation = "y"`

.

## Usage

```
stat_apply_group(
mapping = NULL,
data = NULL,
geom = "line",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_summary_xy(
mapping = NULL,
data = NULL,
geom = "point",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_centroid(
mapping = NULL,
data = NULL,
geom = "point",
.fun = NULL,
.fun.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
```

## Arguments

- mapping
The aesthetic mapping, usually constructed with

`aes`

. Only needs to be set at the layer level if you are overriding the plot defaults.- data
A layer specific dataset - only needed if you want to override the plot defaults.

- geom
The geometric object to use display the data

- .fun.x, .fun.y, .fun
function to be applied or the name of the function to be applied as a character string.

- .fun.x.args, .fun.y.args, .fun.args
additional arguments to be passed to the function as a named list.

- position
The position adjustment to use for overlapping points on this layer

- na.rm
a logical value indicating whether NA values should be stripped before the computation proceeds.

- show.legend
logical. Should this layer be included in the legends?

`NA`

, the default, includes if any aesthetics are mapped.`FALSE`

never includes, and`TRUE`

always includes.- inherit.aes
If

`FALSE`

, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g.`borders`

.- ...
other arguments passed on to

`layer`

. This can include aesthetics whose values you want to set, not map. See`layer`

for more details.

## Value

A data frame with the same variables as the data input, with either a
single or multiple rows, with the values of `x`

and `y`

variables
replaced by the values returned by the applied functions, or possibly
filled with `NA`

if no function was supplied or available by default.
If the applied function returns a named vector, the names are copied into
columns `x.names`

and/or `y.names`

. If the summary function
applied returns a one row data frame, it will be column bound keeping
the column names, but overwritting columns x and/or y with y from the
summary data frame. In the names returned by `.fun.x`

the letter
"y" is replaced by "x". These allows the use of the same functions as in

- x
x-value as returned by

`.fun.x`

, with names removed- y
y-value as returned by

`.fun.y`

, with names removed- x.names
if the x-value returned by

`.fun.x`

is named, these names- y.names
if the y-value returned by

`.fun.y`

is named, these names- xmin, xmax
values returned by

`.fun.x`

under these names, if present- ymin, ymax
values returned by

`.fun.y`

under these names, if present- <other>
additional values as returned by

`.fun.y`

under other names

## Details

`stat_apply_group`

applies functions to data.
When possible it is preferable to use transformations through scales or
summary functions such as `ggplot2::stat_summary()`

,
`stat_summary_xy()`

or `stat_centroid()`

. There are some
computations that are not scale transformations but are not usual summaries
either, as the number of data values does not decrease all the way to one row
per group. A typical case for a summary is the computation of quantiles. For
transformations are cumulative ones, e.g., using `cumsum()`

,
`runmed()`

and similar functions. Obviously, it is always possible to
apply such functions to the data before plotting and passing them to a single
layer function. However, it can be useful to apply such functions on-the-fly
to ensure that grouping is consistent between computations and aesthetics.
One particularity of these statistics is that they can apply simultaneously
different functions to `x`

values and to `y`

values when needed. In
contrast to these statistics, `geom_smooth`

applies a
function that takes both `x`

and `y`

values as arguments.

These four statistics are similar. They differ on whether they return a single or multiple rows of data per group.

## Note

The applied function(s) must accept as first argument a vector that
matches the variables mapped to `x`

or `y`

aesthetics. For
`stat_summary_xy()`

and `stat_centroid()`

the function(s) to be
applied is(are) expected to return a vector of length 1 or a data frame
with only one row, as `mean_se()`

, `mean_cl_normal()`

`mean_cl_boot()`

, `mean_sdl()`

and `median_hilow()`

from
'ggplot2' do.

For `stat_apply_group`

the vectors returned by the
the functions applied to `x`

and `y`

must be of exactly the same
length. When only one of `.fun.x`

or `.fun.y`

are passed a
function as argument, the other variable in the returned data is filled
with `NA_real_`

. If other values are desired, they can be set by means
of a user-defined function.

## References

Answers to question "R ggplot on-the-fly calculation by grouping variable" at https://stackoverflow.com/questions/51412522.

## Examples

```
set.seed(123456)
my.df <- data.frame(X = rep(1:20,2),
Y = runif(40),
category = rep(c("A","B"), each = 20))
# make sure rows are ordered for X as we will use functions that rely on this
my.df <- my.df[order(my.df[["X"]]), ]
# Centroid
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(shape = "cross", size = 6) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "rug", linewidth = 1.5, .fun = median) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "text", aes(label = category)) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(geom = "pointrange",
.fun.x = mean, .fun.y = mean_se) +
geom_point()
# quantiles
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "rug", .fun.y = quantile, .fun.x = quantile)
ggplot(my.df, aes(x = X, y = Y)) +
geom_point() +
stat_apply_group(geom = "rug", sides = "lr", color = "darkred",
.fun.y = quantile) +
stat_apply_group(geom = "text", hjust = "right", color = "darkred",
.fun.y = quantile,
.fun.x = function(x) {rep(22, 5)}, # set x to 22
mapping = aes(label = after_stat(y.names))) +
expand_limits(x = 21)
my.probs <- c(0.25, 0.5, 0.75)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "hline",
aes(yintercept = after_stat(y)),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
# cummulative summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = cummax)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = cumsum, .fun.y = cumsum)
# diff returns a shorter vector by 1 for each group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x[-1L]},
.fun.y = diff, na.rm = TRUE)
# Running summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = runmed, .fun.y.args = list(k = 5))
# Rescaling per group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = function(x) {(x - min(x)) / (max(x) - min(x))})
# inspecting the returned data
if (requireNamespace("gginnards", quietly = TRUE)) {
library(gginnards)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(.fun = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(.fun.y = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.y = cumsum, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "debug",
.fun.x = quantile,
.fun.x.args = list(probs = my.probs),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
}
#> [1] "PANEL 1; group(s) 1, 2; 'draw_function()' input 'data' (head):"
#> colour x y PANEL group x.names y.names
#> 1 #F8766D 5.75 0.3399159 1 1 25% 25%
#> 2 #F8766D 10.50 0.6736796 1 1 50% 50%
#> 3 #F8766D 15.25 0.8791947 1 1 75% 75%
#> 4 #00BFC4 5.75 0.1643890 1 2 25% 25%
#> 5 #00BFC4 10.50 0.5641866 1 2 50% 50%
#> 6 #00BFC4 15.25 0.8576917 1 2 75% 75%
```