`stat_quant_eq`

fits a polynomial model by quantile regression and
generates several labels including the equation, rho, 'AIC' and 'BIC'.

## Usage

```
stat_quant_eq(
mapping = NULL,
data = NULL,
geom = "text_npc",
position = "identity",
...,
formula = NULL,
quantiles = c(0.25, 0.5, 0.75),
method = "rq:br",
method.args = list(),
n.min = 3L,
eq.with.lhs = TRUE,
eq.x.rhs = NULL,
coef.digits = 3,
coef.keep.zeros = TRUE,
decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE),
rho.digits = 4,
label.x = "left",
label.y = "top",
hstep = 0,
vstep = NULL,
output.type = NULL,
na.rm = FALSE,
orientation = NA,
parse = NULL,
show.legend = FALSE,
inherit.aes = TRUE
)
```

## Arguments

- mapping
The aesthetic mapping, usually constructed with

`aes`

. Only needs to be set at the layer level if you are overriding the plot defaults.- data
A layer specific dataset, only needed if you want to override the plot defaults.

- geom
The geometric object to use display the data

- position
The position adjustment to use for overlapping points on this layer

- ...
other arguments passed on to

`layer`

. This can include aesthetics whose values you want to set, not map. See`layer`

for more details.- formula
a formula object. Using aesthetic names instead of original variable names.

- quantiles
numeric vector Values in 0..1 indicating the quantiles.

- method
function or character If character, "rq" or the name of a model fit function are accepted, possibly followed by the fit function's

`method`

argument separated by a colon (e.g.`"rq:br"`

). If a function different to`rq()`

, it must accept arguments named`formula`

,`data`

,`weights`

,`tau`

and`method`

and return a model fit object of class`rq`

or`rqs`

.- method.args
named list with additional arguments passed to

`rq()`

or to a function passed as argument to`method`

.- n.min
integer Minimum number of observations needed for fiting a the model.

- eq.with.lhs
If

`character`

the string is pasted to the front of the equation label before parsing or a`logical`

(see note).- eq.x.rhs
`character`

this string will be used as replacement for`"x"`

in the model equation when generating the label before parsing it.- coef.digits, rho.digits
integer Number of significant digits to use for the fitted coefficients and rho in labels.

- coef.keep.zeros
logical Keep or drop trailing zeros when formatting the fitted coefficients and F-value.

- decreasing
logical It specifies the order of the terms in the returned character string; in increasing (default) or decreasing powers.

- label.x, label.y
`numeric`

with range 0..1 "normalized parent coordinates" (npc units) or character if using`geom_text_npc()`

or`geom_label_npc()`

. If using`geom_text()`

or`geom_label()`

numeric in native data units. If too short they will be recycled.- hstep, vstep
numeric in npc units, the horizontal and vertical step used between labels for different groups.

- output.type
character One of

`"expression"`

,`"LaTeX"`

,`"text"`

,`"markdown"`

or`"numeric"`

. In most cases, instead of using this statistics to obtain numeric values, it is better to use`stat_fit_tidy()`

.- na.rm
a logical indicating whether NA values should be stripped before the computation proceeds.

- orientation
character Either

`"x"`

or`"y"`

controlling the default for`formula`

.- parse
logical Passed to the geom. If

`TRUE`

, the labels will be parsed into expressions and displayed as described in`?plotmath`

. Default is`TRUE`

if`output.type = "expression"`

and`FALSE`

otherwise.- show.legend
logical. Should this layer be included in the legends?

`NA`

, the default, includes if any aesthetics are mapped.`FALSE`

never includes, and`TRUE`

always includes.- inherit.aes
If

`FALSE`

, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g.`borders`

.

## Value

A data frame, with one row per quantile and columns as described
under **Computed variables**. In cases when the number of observations
is less than `n.min`

a data frame with no rows or columns is returned
rendered as an empty/invisible plot layer.

## Details

This statistic interprets the argument passed to `formula`

differently
than `stat_quantile`

accepting `y`

as well as
`x`

as explanatory variable, matching `stat_quant_line()`

.

When two variables are subject to mutual constrains, it is useful to consider both of them as explanatory and interpret the relationship based on them. So, from version 0.4.1 'ggpmisc' makes it possible to easily implement the approach described by Cardoso (2019) under the name of "Double quantile regression".

This stat can be used to automatically annotate a plot with rho or
the fitted model equation. The model fitting is done using package
'quantreg', please, consult its documentation for the
details. It supports only linear models fitted with function `rq()`

,
passing `method = "br"`

to it, should work well with up to several
thousand observations. The rho, AIC, BIC and n annotations can be used with
any linear model formula. The fitted equation label is correctly generated
for polynomials or quasi-polynomials through the origin. Model formulas can
use `poly()`

or be defined algebraically with terms of powers of
increasing magnitude with no missing intermediate terms, except possibly
for the intercept indicated by `"- 1"`

or `"-1"`

or `"+ 0"`

in the formula. The validity of the `formula`

is not checked in the
current implementation. The default aesthetics sets rho as label for the
annotation. This stat generates labels as R expressions by default but
LaTeX (use TikZ device), markdown (use package 'ggtext') and plain text are
also supported, as well as numeric values for user-generated text labels.
The value of `parse`

is set automatically based on `output-type`

,
but if you assemble labels that need parsing from `numeric`

output,
the default needs to be overridden. This stat only generates annotation
labels, the predicted values/line need to be added to the plot as a
separate layer using `stat_quant_line`

,
`stat_quant_band`

or `stat_quantile`

, so
to make sure that the same model formula is used in all steps it is best to
save the formula as an object and supply this object as argument to the
different statistics.

A ggplot statistic receives as data a data frame that is not the one passed
as argument by the user, but instead a data frame with the variables mapped
to aesthetics. `stat_quant_eq()`

mimics how `stat_smooth()`

works, except that only polynomials can be fitted. In other words, it
respects the grammar of graphics. This helps ensure that the model is
fitted to the same data as plotted in other layers.

Function `rq`

does not support singular fits, in
contrast to `lm`

.

The minimum number of observations with distinct values in the explanatory
variable can be set through parameter `n.min`

. The default ```
n.min
= 3L
```

is the smallest usable value. However, model fits with very few
observations are of little interest and using larger values of `n.min`

than the default is usually wise.

## Note

For backward compatibility a logical is accepted as argument for
`eq.with.lhs`

. If `TRUE`

, the default is used, either
`"x"`

or `"y"`

, depending on the argument passed to `formula`

.
However, `"x"`

or `"y"`

can be substituted by providing a
suitable replacement character string through `eq.x.rhs`

.
Parameter `orientation`

is redundant as it only affects the default
for `formula`

but is included for consistency with
`ggplot2::stat_smooth()`

.

R option `OutDec`

is obeyed based on its value at the time the plot
is rendered, i.e., displayed or printed. Set `options(OutDec = ",")`

for languages like Spanish or French.

Support for the `angle`

aesthetic is not automatic and requires
that the user passes as argument suitable numeric values to override the
defaults for label positions.

## User-defined methods

User-defined functions can be passed as
argument to `method`

. The requirements are 1) that the signature is
similar to that of functions from package 'quantreg' and 2) that the value
returned by the function is an object belonging to class `"rq"`

, class
`"rqs"`

, or an atomic `NA`

value.

The `formula`

and `tau`

used to build the equation and quantile
labels aer extracted from the returned `"rq"`

or `"rqs"`

object
and can safely differ from the argument passed to parameter `formula`

in the call to `stat_poly_eq()`

. Thus, user-defined methods can
implement both model selection or conditional skipping of labelling.

## Warning!

For the formatted equations to be valid, the fitted model
must be a polynomial, with or without intercept. If defined using
`poly()`

the argument `raw = TRUE`

must be passed. If defined
manually as powers of `x`

, **the terms must be in order of
increasing powers, with no missing intermediate power term.** Please, see
examples below. A check on the model is used to validate that it is a
polynomial, in most cases a warning is issued. Failing to comply with this
requirement results in the return of `NA`

as the formatted equation.

## Aesthetics

`stat_quant_eq()`

understands `x`

and `y`

,
to be referenced in the `formula`

and `weight`

passed as argument
to parameter `weights`

of `rq()`

. All three must be mapped to
`numeric`

variables. In addition, the aesthetics understood by the
geom used (`"text"`

by default) are understood and grouping respected.

*If the model formula includes a transformation of x, a
matching argument should be passed to parameter eq.x.rhs
as its default value "x" will not reflect the applied
transformation. In plots, transformation should never be applied to the
left hand side of the model formula, but instead in the mapping of the
variable within aes, as otherwise plotted observations and fitted
curve will not match. In this case it may be necessary to also pass
a matching argument to parameter eq.with.lhs.*

## Computed variables

If output.type different from `"numeric"`

the returned tibble contains
columns below in addition to a modified version of the original `group`

:

- x,npcx
x position

- y,npcy
y position

- eq.label
equation for the fitted polynomial as a character string to be parsed

- r.label, and one of cor.label, rho.label, or tau.label
\(rho\) of the fitted model as a character string to be parsed

- AIC.label
AIC for the fitted model.

- n.label
Number of observations used in the fit.

- method.label
Set according

`method`

used.- rq.method
character, method used.

- rho, n
numeric values extracted or computed from fit object.

- hjust, vjust
Set to "inward" to override the default of the "text" geom.

- quantile
Numeric value of the quantile used for the fit

- quantile.f
Factor with a level for each quantile

If output.type is `"numeric"`

the returned tibble contains columns
in addition to a modified version of the original `group`

:

- x,npcx
x position

- y,npcy
y position

- coef.ls
list containing the "coefficients" matrix from the summary of the fit object

- rho, AIC, n
numeric values extracted or computed from fit object

- rq.method
character, method used.

- hjust, vjust
Set to "inward" to override the default of the "text" geom.

- quantile
Indicating the quantile used for the fit

- quantile.f
Factor with a level for each quantile

- b_0.constant
TRUE is polynomial is forced through the origin

- b_i
One or columns with the coefficient estimates

To explore the computed values returned for a given input we suggest the use
of `geom_debug`

as shown in the example below.

## See also

The quantile fit is done with function `rq`

,
please consult its documentation. This `stat_quant_eq`

statistic can
return ready formatted labels depending on the argument passed to
`output.type`

. This is possible because only polynomial models are
supported. For other types of models, statistics
`stat_fit_glance`

, `stat_fit_tidy`

and
`stat_fit_glance`

should be used instead and the code for
construction of character strings from numeric values and their mapping to
aesthetic `label`

needs to be explicitly supplied in the call.

Other ggplot statistics for quantile regression:
`stat_quant_band()`

,
`stat_quant_line()`

## Examples

```
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
y <- y / max(y)
my.data <- data.frame(x = x, y = y,
group = c("A", "B"),
y2 = y * c(1, 2) + max(y) * c(0, 0.1),
w = sqrt(x))
# using defaults
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line() +
stat_quant_eq()
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line() +
stat_quant_eq(mapping = use_label("eq"))
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line() +
stat_quant_eq(mapping = use_label("eq"), decreasing = TRUE)
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line() +
stat_quant_eq(mapping = use_label("eq", "method"))
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
# same formula as default
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ x) +
stat_quant_eq(formula = y ~ x)
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
# explicit formula "x explained by y"
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = x ~ y) +
stat_quant_eq(formula = x ~ y)
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
# using color
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(mapping = aes(color = after_stat(quantile.f))) +
stat_quant_eq(mapping = aes(color = after_stat(quantile.f))) +
labs(color = "Quantiles")
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
# location and colour
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(mapping = aes(color = after_stat(quantile.f))) +
stat_quant_eq(mapping = aes(color = after_stat(quantile.f)),
label.y = "bottom", label.x = "right") +
labs(color = "Quantiles")
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
# give a name to a formula
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = formula, linewidth = 0.5) +
stat_quant_eq(formula = formula)
# angle
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = formula, linewidth = 0.5) +
stat_quant_eq(formula = formula, angle = 90, hstep = 0.04, vstep = 0,
label.y = 0.02, hjust = 0) +
expand_limits(x = -15) # make space for equations
# user set quantiles
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = formula, quantiles = 0.5) +
stat_quant_eq(formula = formula, quantiles = 0.5)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_band(formula = formula,
quantiles = c(0.1, 0.5, 0.9)) +
stat_quant_eq(formula = formula, parse = TRUE,
quantiles = c(0.1, 0.5, 0.9))
# grouping
ggplot(my.data, aes(x, y2, color = group)) +
geom_point() +
stat_quant_line(formula = formula, linewidth = 0.5) +
stat_quant_eq(formula = formula)
ggplot(my.data, aes(x, y2, color = group)) +
geom_point() +
stat_quant_band(formula = formula, linewidth = 0.75) +
stat_quant_eq(formula = formula) +
theme_bw()
# labelling equations
ggplot(my.data, aes(x, y2, shape = group, linetype = group,
grp.label = group)) +
geom_point() +
stat_quant_band(formula = formula, color = "black", linewidth = 0.75) +
stat_quant_eq(mapping = use_label("grp", "eq", sep = "*\": \"*"),
formula = formula) +
expand_limits(y = 3) +
theme_classic()
# modifying the explanatory variable within the model formula
# modifying the response variable within aes()
formula.trans <- y ~ I(x^2)
ggplot(my.data, aes(x, y + 1)) +
geom_point() +
stat_quant_line(formula = formula.trans) +
stat_quant_eq(mapping = use_label("eq"),
formula = formula.trans,
eq.x.rhs = "~x^2",
eq.with.lhs = "y + 1~~`=`~~")
#> Warning: 'formula' not an increasing polynomial: 'eq.label' is NA!
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_text_npc()`).
# using weights
ggplot(my.data, aes(x, y, weight = w)) +
geom_point() +
stat_quant_line(formula = formula, linewidth = 0.5) +
stat_quant_eq(formula = formula)
# no weights, quantile set to upper boundary
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = formula, quantiles = 0.95) +
stat_quant_eq(formula = formula, quantiles = 0.95)
#> Warning: 2 non-positive fis
# manually assemble and map a specific label using paste() and aes()
ggplot(my.data, aes(x, y2, color = group, grp.label = group)) +
geom_point() +
stat_quant_line(method = "rq", formula = formula,
quantiles = c(0.05, 0.5, 0.95),
linewidth = 0.5) +
stat_quant_eq(mapping = aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
quantiles = c(0.05, 0.5, 0.95),
formula = formula, size = 3)
#> Warning: 18 non-positive fis
#> Warning: 16 non-positive fis
#> Warning: 9 non-positive fis
#> Warning: 4 non-positive fis
# manually assemble and map a specific label using sprintf() and aes()
ggplot(my.data, aes(x, y2, color = group, grp.label = group)) +
geom_point() +
stat_quant_band(method = "rq", formula = formula,
quantiles = c(0.05, 0.5, 0.95)) +
stat_quant_eq(mapping = aes(label = sprintf("%s*\": \"*%s",
after_stat(grp.label),
after_stat(eq.label))),
quantiles = c(0.05, 0.5, 0.95),
formula = formula, size = 3)
# geom = "text"
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_line(formula = formula, quantiles = 0.5) +
stat_quant_eq(label.x = "left", label.y = "top",
formula = formula,
quantiles = 0.5)
# Inspecting the returned data using geom_debug()
# This provides a quick way of finding out the names of the variables that
# are available for mapping to aesthetics using after_stat().
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)
if (gginnards.installed)
library(gginnards)
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(formula = formula, geom = "debug")
#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#> npcx npcy
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> label
#> 1 italic(y)~`=`~-0.0257 + 0.000496*~italic(x) + 2.22%*% 10^{-06}*~italic(x)^2 + 8.10%*% 10^{-07}*~italic(x)^3
#> 2 italic(y)~`=`~-0.0122 + 0.000971*~italic(x) - 9.60%*% 10^{-06}*~italic(x)^2 + 9.45%*% 10^{-07}*~italic(x)^3
#> 3 italic(y)~`=`~0.0413 + 0.000437*~italic(x) - 1.26%*% 10^{-05}*~italic(x)^2 + 1.04%*% 10^{-06}*~italic(x)^3
#> eq.label
#> 1 italic(y)~`=`~-0.0257 + 0.000496*~italic(x) + 2.22%*% 10^{-06}*~italic(x)^2 + 8.10%*% 10^{-07}*~italic(x)^3
#> 2 italic(y)~`=`~-0.0122 + 0.000971*~italic(x) - 9.60%*% 10^{-06}*~italic(x)^2 + 9.45%*% 10^{-07}*~italic(x)^3
#> 3 italic(y)~`=`~0.0413 + 0.000437*~italic(x) - 1.26%*% 10^{-05}*~italic(x)^2 + 1.04%*% 10^{-06}*~italic(x)^3
#> AIC.label rho.label n.label grp.label
#> 1 plain(AIC)~`=`~"-269.2" italic(rho)~`=`~"1.725" italic(n)~`=`~100
#> 2 plain(AIC)~`=`~"-293." italic(rho)~`=`~"2.042" italic(n)~`=`~100
#> 3 plain(AIC)~`=`~"-275.9" italic(rho)~`=`~"1.668" italic(n)~`=`~100
#> qtl.label method.label rho rq.method quantile quantile.f
#> 1 italic(q)~`=`~"0.25" "method: rq:br" 1.724867 br 0.25 0.25
#> 2 italic(q)~`=`~"0.50" "method: rq:br" 2.042215 br 0.50 0.50
#> 3 italic(q)~`=`~"0.75" "method: rq:br" 1.667752 br 0.75 0.75
#> n fm.method fm.class fm.formula fm.formula.chr
#> 1 100 rq:br rqs y ~ poly(x, 3, raw = TRUE) y ~ poly(x, 3, raw = TRUE)
#> 2 100 rq:br rqs y ~ poly(x, 3, raw = TRUE) y ~ poly(x, 3, raw = TRUE)
#> 3 100 rq:br rqs y ~ poly(x, 3, raw = TRUE) y ~ poly(x, 3, raw = TRUE)
#> x y PANEL group orientation
#> 1 1 0.8912737 1 -1 x
#> 2 1 0.9456369 1 -1 x
#> 3 1 1.0000000 1 -1 x
if (FALSE) {
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(mapping = aes(label = after_stat(eq.label)),
formula = formula, geom = "debug",
output.type = "markdown")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(formula = formula, geom = "debug", output.type = "text")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(formula = formula, geom = "debug", output.type = "numeric")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75),
geom = "debug", output.type = "text")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75),
geom = "debug", output.type = "numeric")
}
```