stat_poly_line() fits a polynomial, by default with
stats::lm(), but alternatively using robust regression or generalized
least squares. Predicted values and a confidence band, if possible, are
computed and, by default, plotted.
Usage
stat_poly_line(
mapping = NULL,
data = NULL,
geom = "smooth",
position = "identity",
...,
orientation = NA,
method = "lm",
formula = NULL,
se = NULL,
fit.seed = NA,
fm.values = FALSE,
n = 80,
fullrange = FALSE,
level = 0.95,
method.args = list(),
n.min = 2L,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)Arguments
- mapping
The aesthetic mapping, usually constructed with
aes. Only needs to be set at the layer level if you are overriding the plot defaults.- data
A layer specific dataset, only needed if you want to override the plot defaults.
- geom
The geometric object to use display the data
- position
The position adjustment to use for overlapping points on this layer.
- ...
other arguments passed on to
layer. This can include aesthetics whose values you want to set, not map. Seelayerfor more details.- orientation
character Either "x" or "y" controlling the default for
formula. The letter indicates the aesthetic considered the explanatory variable in the model fit.- method
function or character If character, "lm", "rlm", "lts". "gls" "ma", "sma", or the name of a model fit function are accepted, possibly followed by the fit function's
methodargument separated by a colon (e.g."rlm:M"). If a function is different tolm(),rlm(),ltsReg(),gls(),ma,sma, it must have formal parameters namedformula,data,weights, andmethod. See Details.- formula
a formula object. Using aesthetic names
xandyinstead of original variable names.- se
Display confidence interval around smooth? (`TRUE` by default only for fits with
lm()andrlm(), see `level` to control.)- fit.seed
RNG seed argument passed to
set.seed(). Defaults toNA, indicating thatset.seed()should not be called.- fm.values
logical Add metadata and parameter estimates extracted from the fitted model object;
FALSEby default.- n
Number of points at which to predict with the fitted model.
- fullrange
Should the fit span the full range of the plot, or just the range of the data group used in each fit?
- level
Level of confidence interval to use (0.95 by default).
- method.args
named list with additional arguments. Not
dataorweightswhich are always passed through aesthetic mappings.- n.min
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted.
- na.rm
a logical indicating whether NA values should be stripped before the computation proceeds.
- show.legend
logical. Should this layer be included in the legends?
NA, the default, includes if any aesthetics are mapped.FALSEnever includes, andTRUEalways includes.- inherit.aes
If
FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g.borders.
Value
The value returned by the statistic is a data frame, with n
rows of predicted values and their confidence limits. Optionally it will
also include additional values related to the model fit. When a
predict() method is not available for the fitted model class, the
value returned by calling fitted(), if available, is returned
instead, with a message. In case of failure data.frame(), an empty
data frame, is returned resulting without issuing an error, in plot layer
addition being skipped, for the failing data group.
Details
This statistic is similar to stat_smooth() but
has different defaults and supports additional model fit functions. It also
interprets the argument passed to formula differently than
stat_smooth(), accepting y as explanatory variable and
setting orientation automatically. The default for method is
"lm" and spline-based smoothers like loess are not supported.
Other defaults are consistent with those in stat_poly_eq(),
stat_quant_line(), stat_quant_band(), stat_quant_eq(),
stat_ma_line(), stat_ma_eq(). As some model fitting
functions can depend on the RNG (pseudo-Random Number Generator),
fit.seed if not NA is used as argument in a call to
set.seed() immediately ahead of model fitting.
geom_poly_line() treats the x and y aesthetics
differently and can thus has two orientations. The orientation can be
deduced from the argument passed to formula. Thus,
stat_poly_line() will by default guess which orientation the layer
should have. If no argument is passed to formula, the formula
defaults to y ~ x. For consistency with
stat_smooth orientation can be also specified
directly passing an argument to the orientation parameter, which can
be either "x" or "y". The value of orientation gives
the aesthetic that is taken as the explanatory variable in the model
formula.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
With method "lm", singularity results in terms being dropped with a
message if more numerous than can be fitted with a singular (exact) fit. In
this case and if the model results in a perfect fit due to low number of
observation, estimates for various parameters are NaN or NA.
With methods other than "lm", the model fit functions simply fail in
case of singularity, e.g., singular fits are not implemented in
"rlm".
In both cases the minimum number of observations with distinct values in
the explanatory variable can be set through parameter n.min. The
default n.min = 2L is the smallest suitable for method "lm"
but too small for method "rlm" for which n.min = 3L is
needed. Anyway, model fits with very few observations are of little
interest and using larger values of n.min than the default is wise.
Note
Currently confidence bands for the regression line are not plotted in some cases, and in the case of MA and SMA models, the band only displays the uncertainty of the slope rather than for both slope plus intercept.
Model fit methods supported
Several model fit functions are
supported explicitly (see tables), and some of their differences smoothed
out. Compatibility is checked late, based on the class of the returned
fitted model object. This makes it possible to use wrapper functions that
do model selection or other adjustments to the fit procedure on a per panel
or per group basis. Moreover, if the value returned as model fit object is
NULL no layer is added to the plot on a per group within panel
basis.
In the case of fitted model objects of classes not explicitly supported an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the valisdity of the values extracted.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation, or a character
string matching the function name. This approach makes it possible to
support model fit functions that are not dependencies of 'ggpmisc'. Either
by attaching the package where the function is defined and passing it by
name or as string, or using double colon notation when passing the name of
the function. User-defined functions can be passed as argument to parameter
method as long as they have parameters formula, data
subset and possibly weights. Additional arguments can be
passed to any method as a named list as an argument to parameter
method.args. As in stat_smooth()
prior weights are passed to the model fit functions' weights
(plural!) parameter by mapping a numeric variable to plot aesthetic
weight (singular!).
The table below lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call.
| Statistic | \(f\) | Supported model fit methods |
stat_poly_line() | G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() | G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() | G | "rq", "rqss" |
stat_quant_band() | G | "rq", "rqss" |
stat_quant_eq() | G | "rq", "rqss" |
stat_ma_line() | G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() | G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() | G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() | G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() | G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() | G | any with 'broom' method augment() |
stat_fit_glance() | G | any with 'broom' method glance() |
stat_fit_tidy() | G | any with 'broom' method tidy() |
stat_fit_tb() | P | any with 'broom' method tidy() |
The table below lists the names for fit methods coded in the statistics
as given in the table above. The single colon notation is based on parsing
the name and is available whenever passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. In some cases the default formula = y ~ x
needs to be overridden with an explicit argument.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() | 'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() | 'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() | 'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() | 'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() | 'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() | 'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() | 'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() | 'lmodel2' |
Computed variables
`stat_poly_line()` provides the following variables, some of which depend on the orientation:
- y or x
predicted value
- ymin or xmin
lower confidence limit around the fitted line
- ymax or xmax
upper confidence limit around the fitted line
- se
standard error
If fm.values = TRUE is passed then columns based on the summary of
the model fit are added, with the same value in each row within a group.
This is wasteful and disabled by default, but provides a simple and robust
approach to achieve effects like colouring or hiding of the model fit line
based on P-values, r-squared, adjusted r-squared or the number of
observations.
Aesthetics
stat_poly_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x | |
| • | y | |
| • | group | → inferred |
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Examples
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line()
# same as default
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line(formula = y ~ x)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line(formula = x ~ y)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line(formula = y ~ poly(x, 3))
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line(formula = x ~ poly(y, 3))
# Smooths are automatically fit to each group (defined by categorical
# aesthetics or the group aesthetic) and for each facet.
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point() +
stat_poly_line(se = FALSE)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line() +
facet_wrap(~drv)
# Inspecting the returned data using geom_debug_group()
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)
if (gginnards.installed)
library(gginnards)
if (gginnards.installed)
ggplot(mpg, aes(displ, hwy)) +
stat_poly_line(geom = "debug_group")
#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#> x y ymin ymax flipped_aes PANEL group orientation
#> 1 1.600000 30.04871 29.17768 30.91974 FALSE 1 -1 x
#> 2 1.668354 29.80738 28.95779 30.65696 FALSE 1 -1 x
#> 3 1.736709 29.56605 28.73763 30.39446 FALSE 1 -1 x
#> 4 1.805063 29.32471 28.51718 30.13225 FALSE 1 -1 x
#> 5 1.873418 29.08338 28.29640 29.87036 FALSE 1 -1 x
#> 6 1.941772 28.84205 28.07529 29.60882 FALSE 1 -1 x
if (gginnards.installed)
ggplot(mpg, aes(displ, hwy)) +
stat_poly_line(geom = "debug_group", fm.values = TRUE)
#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#> x y ymin ymax p.value r.squared adj.r.squared n
#> 1 1.600000 30.04871 29.17768 30.91974 2.038974e-46 0.5867867 0.5850056 234
#> 2 1.668354 29.80738 28.95779 30.65696 2.038974e-46 0.5867867 0.5850056 234
#> 3 1.736709 29.56605 28.73763 30.39446 2.038974e-46 0.5867867 0.5850056 234
#> 4 1.805063 29.32471 28.51718 30.13225 2.038974e-46 0.5867867 0.5850056 234
#> 5 1.873418 29.08338 28.29640 29.87036 2.038974e-46 0.5867867 0.5850056 234
#> 6 1.941772 28.84205 28.07529 29.60882 2.038974e-46 0.5867867 0.5850056 234
#> fm.class fm.method fm.formula fm.formula.chr flipped_aes PANEL group
#> 1 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> 2 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> 3 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> 4 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> 5 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> 6 lm lm:qr y ~ x y ~ x FALSE 1 -1
#> orientation
#> 1 x
#> 2 x
#> 3 x
#> 4 x
#> 5 x
#> 6 x
if (gginnards.installed)
ggplot(mpg, aes(displ, hwy)) +
stat_poly_line(geom = "debug_group", method = lm, fm.values = TRUE)
#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#> x y ymin ymax p.value r.squared adj.r.squared n
#> 1 1.600000 30.04871 29.17768 30.91974 2.038974e-46 0.5867867 0.5850056 234
#> 2 1.668354 29.80738 28.95779 30.65696 2.038974e-46 0.5867867 0.5850056 234
#> 3 1.736709 29.56605 28.73763 30.39446 2.038974e-46 0.5867867 0.5850056 234
#> 4 1.805063 29.32471 28.51718 30.13225 2.038974e-46 0.5867867 0.5850056 234
#> 5 1.873418 29.08338 28.29640 29.87036 2.038974e-46 0.5867867 0.5850056 234
#> 6 1.941772 28.84205 28.07529 29.60882 2.038974e-46 0.5867867 0.5850056 234
#> fm.class fm.method fm.formula fm.formula.chr flipped_aes PANEL group
#> 1 lm lm y ~ x y ~ x FALSE 1 -1
#> 2 lm lm y ~ x y ~ x FALSE 1 -1
#> 3 lm lm y ~ x y ~ x FALSE 1 -1
#> 4 lm lm y ~ x y ~ x FALSE 1 -1
#> 5 lm lm y ~ x y ~ x FALSE 1 -1
#> 6 lm lm y ~ x y ~ x FALSE 1 -1
#> orientation
#> 1 x
#> 2 x
#> 3 x
#> 4 x
#> 5 x
#> 6 x
