Predicted equation from Normal mixture model fit

stat_normalmix_eq() fits a Normal mixture model, by default with normalmixEM(). Predicted values are computed and, by default, plotted.

Usage

stat_normalmix_eq(
  mapping = NULL,
  data = NULL,
  geom = "text_npc",
  position = "identity",
  ...,
  method = "normalmixEM",
  method.args = list(),
  n.min = 10L * k,
  level = 0.95,
  k = 2,
  free.mean = TRUE,
  free.sd = TRUE,
  se = FALSE,
  seed = NA,
  fm.values = TRUE,
  components = NULL,
  eq.with.lhs = TRUE,
  eq.digits = 2,
  label.x = "left",
  label.y = "top",
  hstep = 0,
  vstep = NULL,
  output.type = NULL,
  na.rm = FALSE,
  orientation = "x",
  parse = NULL,
  show.legend = NA,
  inherit.aes = TRUE
)

Arguments

mapping: The aesthetic mapping, usually constructed with aes. Only needs to be set at the layer level if you are overriding the plot defaults.
data: A layer specific dataset, only needed if you want to override the plot defaults.
geom: The geometric object to use display the data
position: The position adjustment to use for overlapping points on this layer.
...: other arguments passed on to layer. This can include aesthetics whose values you want to set, not map. See layer for more details.
method: function or character If character, "normalmixEM" or the name of a model fit function are accepted, possibly followed by the fit function's method argument separated by a colon. The function must return a model fit object of class mixEM.
method.args: named list with additional arguments.
n.min: integer Minimum number of distinct values in the mapped variable for fitting to the attempted.
level: Level of confidence interval to use (0.95 by default).
k: integer Number of mixture components to fit.
free.mean, free.sd: logical If TRUE, allow the fitted mean and/or fitted sd to vary among the component Normal distributions.
se: logical, if TRUE standard errors for parameter estimates are obtained by bootstrapping.
seed: RNG seed argument passed to set.seed(). Defaults to NA, which means that set.seed() will not be called.
fm.values: logical Add parameter estimates and their standard errors to the returned values (`FALSE` by default.)
components: character One of "all", "sum", or members select which densities are returned.
eq.with.lhs: If character the string is pasted to the front of the equation label before parsing or a logical (see note).
eq.digits: integer Number of digits after the decimal point to use for parameters in labels. If Inf, use exponential notation with three decimal places.
label.x, label.y: numeric with range 0..1 "normalized parent coordinates" (npc units) or character if using geom_text_npc() or geom_label_npc(). If using geom_text() or geom_label() numeric in native data units. If too short they will be recycled.
hstep, vstep: numeric in npc units, the horizontal and vertical step used between labels for different mixture model components.
output.type: character One of "expression", "LaTeX", "text", "markdown" or "numeric".
na.rm: a logical indicating whether NA values should be stripped before the computation proceeds.
orientation: character Either "x" or "y", the mapping of the values to which the mixture model is to be fitetd. NOT YET IMPLEMENTED!
parse: logical Passed to the geom. If TRUE, the labels will be parsed into expressions and displayed as described in ?plotmath. Default is TRUE if output.type = "expression" and FALSE otherwise.
show.legend: logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes: If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

Value

The value returned by the statistic is a data frame, with n rows of predicted density for each component of the mixture plus their sum and the corresponding vector of x values. Optionally it will also include additional values related to the model fit.

Details

This statistic is similar to stat_density but instead of fitting a single distribution it can fit a mixture of two or more Normal distributions, using an approach related to clustering. Defaults are consistent between stat_normalmix_line() and stat_normalmix_eq(). Parameter seed if not NA is used in a call to set.seed() immediately before calling the model fit function. As the fitting procedure makes use of the (pseudo-)random number generator (RNG), convergence can depend on it, and in such cases setting seed to the same value in stat_normalmix_line() and in stat_normalmix_eq() can ensure consistency, and more generally, reproducibility.

A mixture model as described above, is fitted for k >= 2, while k == 1 is treated as a special case and a Normal distribution fitted with function fitdistr(). In this case the SE values are exact estimates.

Computed variables

stat_normalmix_eq() provides the following variables, some of which depend on the orientation:

y: the location of text labels
eq.label: character string for equations
eq.label: character string for number of observations
eq.label: character string for model fit method
lambda: numeric the estimate of the contribution of the component of the mixture towards the joint density
mu: numeric the estimate of the mean
sigma: numeric the estimate of the standard deviation
component: A factor indexing the components of the mixture and/or their sum

If SE = TRUE is passed then columns with standard errors for the parameter estimates:

mu.se: numeric the estimate of the mean
sigma.se: numeric the estimate of the standard deviation

If fm.values = TRUE is passed then columns with diagnosis and parameters estimates are added, with the same value in each row within a group:

n: numeric the number of x values
.size: numeric the number of density values
fm.class: character the most derived class of the fitted model object
fm.method: character the method, as given by the ft field of the fitted model objects

This is wasteful and disabled by default, but provides a simple and robust approach to achieve effects like colouring or hiding of the model fit line by group depending on the outcome of model fitting.

Aesthetics

stat_normalmix_eq expects observations mapped to x from a numeric variable. A new grouping is added by mapping as default component to the group aesthetic and eq.label to the label aesthetic. Additional aesthetics as understood by the geom ("text_npc" by default) can be set.

Examples

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "sum") +
  stat_normalmix_eq()
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 31 
#> number of iterations= 35 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "sum") +
  stat_normalmix_eq(use_label("eq", "n", "method"))
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 15 
#> number of iterations= 22 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "sum") +
  stat_normalmix_eq(geom = "label_npc")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 18 
#> number of iterations= 44 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "sum") +
  stat_normalmix_eq(geom = "text", label.x = "center", label.y = "bottom")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 20 
#> number of iterations= 27 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "sum") +
  stat_normalmix_eq(geom = "text", hjust = "inward")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 39 
#> number of iterations= 29 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "members") +
  stat_normalmix_eq(components = "members")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 24 
#> number of iterations= 41 

ggplot(faithful, aes(x = waiting)) +
  stat_normalmix_line(components = "members") +
  stat_normalmix_eq(components = "members", se = TRUE)
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 56 
#> number of iterations= 27 
#> number of iterations= 11 
#> number of iterations= 14 
#> number of iterations= 25 
#> number of iterations= 41 
#> number of iterations= 14 
#> number of iterations= 13 
#> number of iterations= 10 
#> number of iterations= 20 
#> number of iterations= 20 
#> number of iterations= 15 
#> number of iterations= 11 
#> number of iterations= 14 
#> number of iterations= 18 
#> number of iterations= 14 
#> number of iterations= 15 
#> number of iterations= 10 
#> number of iterations= 11 
#> number of iterations= 9 
#> number of iterations= 23 
#> number of iterations= 11 
#> number of iterations= 17 
#> number of iterations= 11 
#> number of iterations= 14 
#> number of iterations= 21 
#> number of iterations= 10 
#> number of iterations= 19 
#> number of iterations= 13 
#> number of iterations= 19 
#> number of iterations= 14 
#> number of iterations= 18 
#> number of iterations= 15 
#> number of iterations= 17 
#> number of iterations= 25 
#> number of iterations= 14 
#> number of iterations= 11 
#> number of iterations= 14 
#> number of iterations= 14 
#> number of iterations= 14 
#> number of iterations= 30 
#> number of iterations= 11 
#> number of iterations= 15 
#> number of iterations= 14 
#> number of iterations= 26 
#> number of iterations= 17 
#> number of iterations= 16 
#> number of iterations= 10 
#> number of iterations= 15 
#> number of iterations= 11 
#> number of iterations= 22 
#> number of iterations= 14 
#> number of iterations= 18 
#> number of iterations= 12 
#> number of iterations= 12 
#> number of iterations= 11 
#> number of iterations= 15 
#> number of iterations= 18 
#> number of iterations= 13 
#> number of iterations= 18 
#> number of iterations= 16 
#> number of iterations= 18 
#> number of iterations= 16 
#> number of iterations= 20 
#> number of iterations= 13 
#> number of iterations= 16 
#> number of iterations= 6 
#> number of iterations= 17 
#> number of iterations= 21 
#> number of iterations= 9 
#> number of iterations= 14 
#> number of iterations= 16 
#> number of iterations= 21 
#> number of iterations= 14 
#> number of iterations= 17 
#> number of iterations= 14 
#> number of iterations= 14 
#> number of iterations= 12 
#> number of iterations= 20 
#> number of iterations= 16 
#> number of iterations= 13 
#> number of iterations= 19 
#> number of iterations= 30 
#> number of iterations= 12 
#> number of iterations= 9 
#> number of iterations= 13 
#> number of iterations= 14 
#> number of iterations= 13 
#> number of iterations= 20 
#> number of iterations= 13 
#> number of iterations= 11 
#> number of iterations= 16 
#> number of iterations= 11 
#> number of iterations= 10 
#> number of iterations= 12 
#> number of iterations= 13 
#> number of iterations= 19 
#> number of iterations= 23 
#> number of iterations= 8 
#> number of iterations= 21 
#> number of iterations= 19 
#> number of iterations= 17 

# ggplot(faithful, aes(y = waiting)) +
#  stat_normalmix_eq(orientation = "y")

ggplot(faithful, aes(x = waiting)) +
 geom_histogram(aes(y = after_stat(density)), bins = 20) +
 stat_normalmix_line(aes(colour = after_stat(component),
                         fill = after_stat(component)),
                     geom = "area", linewidth = 1, alpha = 0.25) +
 stat_normalmix_eq(aes(colour = after_stat(component)))
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 37 
#> number of iterations= 32 

ggplot(faithful, aes(x = waiting)) +
 stat_normalmix_line(aes(colour = after_stat(component),
                         fill = after_stat(component)),
                     geom = "area", linewidth = 1, alpha = 0.25,
                     components = "members") +
 stat_normalmix_eq(aes(colour = after_stat(component)),
                     components = "members")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 23 
#> number of iterations= 37 

ggplot(faithful, aes(x = waiting)) +
 stat_normalmix_line(geom = "area", linewidth = 1, alpha = 0.25,
                     colour = "black", outline.type = "upper",
                     components = "sum", se = FALSE) +
 stat_normalmix_eq(components = "sum")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 32 
#> number of iterations= 30 

# special case of no mixture
ggplot(subset(faithful, waiting > 66), aes(x = waiting)) +
  stat_normalmix_line(k = 1) +
  stat_normalmix_eq(k = 1)
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation
#> With k = 1 one Normal distribution is fitted. Irrelevant parameters ignored!
#> With k = 1 one Normal distribution is fitted. Irrelevant parameters ignored!


ggplot(subset(faithful, waiting > 66), aes(x = waiting)) +
  stat_normalmix_line(k = 1) +
  stat_normalmix_eq(k = 1, se = TRUE)
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation
#> With k = 1 one Normal distribution is fitted. Irrelevant parameters ignored!
#> With k = 1 one Normal distribution is fitted. Irrelevant parameters ignored!


# Inspecting the returned data using geom_debug()
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)

if (gginnards.installed)
  library(gginnards)

if (gginnards.installed)
  ggplot(faithful, aes(x = waiting)) +
    stat_normalmix_line(geom = "debug", components = "all")

#> number of iterations= 36 
#> [1] "PANEL 1; group(s) comp.1  , comp.2  , comp.sum; 'draw_function()' input 'data' (head):"
#>          x component      density flipped_aes PANEL    group            y
#> 1 35.29541  comp.sum 1.092383e-04       FALSE     1 comp.sum 1.092383e-04
#> 2 35.29541    comp.1 1.092383e-04       FALSE     1   comp.1 1.092383e-04
#> 3 35.29541    comp.2 9.599836e-15       FALSE     1   comp.2 9.599836e-15
#> 4 35.61754  comp.sum 1.306554e-04       FALSE     1 comp.sum 1.306554e-04
#> 5 35.61754    comp.1 1.306554e-04       FALSE     1   comp.1 1.306554e-04
#> 6 35.61754    comp.2 1.457559e-14       FALSE     1   comp.2 1.457559e-14
#>   orientation
#> 1           x
#> 2           x
#> 3           x
#> 4           x
#> 5           x
#> 6           x
    stat_normalmix_eq(geom = "debug", components = "all")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation
#> geom_debug: na.rm = FALSE
#> stat_normalmix_eq: method = normalmixEM, method.name = normalmixEM, se = FALSE, seed = NA, level = 0.95, na.rm = FALSE, orientation = x, method.args = list(), k = 2, free.mean = TRUE, free.sd = TRUE, components = all, n.min = 20, eq.with.lhs = TRUE, eq.digits = 2, label.x = left, label.y = top, hstep = 0, vstep = 0.05, npc.used = FALSE, output.type = expression, parse = TRUE
#> position_identity 

if (gginnards.installed)
  ggplot(faithful, aes(x = waiting)) +
    stat_normalmix_eq(geom = "debug", components = "sum")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 27 
#> [1] "PANEL 1; group(s) comp.sum; 'draw_function()' input 'data' (head):"
#>   lambda mu sigma  k converged   n fm.class   fm.method component
#> 1      1 NA    NA NA      TRUE 272    mixEM normalmixEM  comp.sum
#>                                                                                             eq.label
#> 1 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9) + 0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9)
#>     n.label          method.label npcx npcy PANEL    group
#> 1 n~`=`~272 "method: normalmixEM"   NA   NA     1 comp.sum
#>                                                                                                label
#> 1 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9) + 0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9)
#>      x    y orientation
#> 1 0.05 0.95           x

if (gginnards.installed)
  ggplot(faithful, aes(x = waiting)) +
    stat_normalmix_eq(geom = "debug", components = "members")
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 33 
#> [1] "PANEL 1; group(s) comp.1, comp.2; 'draw_function()' input 'data' (head):"
#>      lambda      mu    sigma k converged   n fm.class   fm.method component
#> 1 0.6391128 80.0911 5.867711 2      TRUE 272    mixEM normalmixEM    comp.1
#> 2 0.3608872 54.6149 5.871252 2      TRUE 272    mixEM normalmixEM    comp.2
#>                                              eq.label   n.label
#> 1 DF~`=`~0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9) n~`=`~272
#> 2 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9) n~`=`~272
#>            method.label npcx    x npcy PANEL  group
#> 1 "method: normalmixEM"   NA 0.05   NA     1 comp.1
#> 2 "method: normalmixEM"   NA 0.05   NA     1 comp.2
#>                                                 label    y orientation
#> 1 DF~`=`~0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9) 0.95           x
#> 2 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9)  0.9           x

if (gginnards.installed)
  ggplot(faithful, aes(x = waiting)) +
    stat_normalmix_eq(geom = "debug",
                      components = "members",
                      fm.values = TRUE)
#> Warning: Duplicated aesthetics after name standardisation: na.rm and orientation

#> number of iterations= 27 
#> [1] "PANEL 1; group(s) comp.1, comp.2; 'draw_function()' input 'data' (head):"
#>      lambda      mu    sigma k converged   n fm.class   fm.method component
#> 1 0.3608872 54.6149 5.871252 2      TRUE 272    mixEM normalmixEM    comp.1
#> 2 0.6391128 80.0911 5.867711 2      TRUE 272    mixEM normalmixEM    comp.2
#>                                              eq.label   n.label
#> 1 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9) n~`=`~272
#> 2 DF~`=`~0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9) n~`=`~272
#>            method.label npcx    x npcy PANEL  group
#> 1 "method: normalmixEM"   NA 0.05   NA     1 comp.1
#> 2 "method: normalmixEM"   NA 0.05   NA     1 comp.2
#>                                                 label    y orientation
#> 1 DF~`=`~0.36 %*% italic(N)(mu*`=`*55, sigma*`=`*5.9) 0.95           x
#> 2 DF~`=`~0.64 %*% italic(N)(mu*`=`*80, sigma*`=`*5.9)  0.9           x