
Predicted line from distribution mixture model fit
Source:R/stat-distrmix-line.R
stat_distrmix_line.Rdstat_distrmix_line() fits a Normal mixture model, by default with
normalmixEM(). Predicted values are
computed and, by default, plotted.
Usage
stat_distrmix_line(
mapping = NULL,
data = NULL,
geom = "line",
position = "identity",
...,
orientation = "x",
method = "normalmixEM",
se = NULL,
fit.seed = NA,
fm.values = FALSE,
n = min(100 + 50 * k, 300),
fullrange = TRUE,
level = 0.95,
method.args = list(),
k = 2,
free.mean = TRUE,
free.sd = TRUE,
components = "all",
n.min = 10L * k,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)Arguments
- mapping
The aesthetic mapping, usually constructed with
aes. Only needs to be set at the layer level if you are overriding the plot defaults.- data
A layer specific dataset, only needed if you want to override the plot defaults.
- geom
The geometric object to use display the data
- position
The position adjustment to use for overlapping points on this layer.
- ...
other arguments passed on to
layer. This can include aesthetics whose values you want to set, not map. Seelayerfor more details.- orientation
character Either "x" or "y", the mapping of the values to which the mixture model is to be fitetd. NOT YET IMPLEMENTED!
- method
function or character If character, "normalmixEM" or the name of a model fit function are accepted, possibly followed by the fit function's
methodargument separated by a colon. The function must return a model fit object of classmixEM.- se
Currently ignored.
- fit.seed
RNG seed argument passed to
set.seed(). Defaults toNA, which means thatset.seed()will not be called.- fm.values
logical Add parameter estimates and their standard errors to the returned values (`FALSE` by default.)
- n
Number of points at which to evaluate the model prediction.
- fullrange
Should the prediction span the combined range of the scale and of the fitted distributions, or just span the range of the data?
- level
Level of confidence interval to use (0.95 by default).
- method.args
named list with additional arguments.
- k
integer Number of mixture components to fit.
- free.mean, free.sd
logical If TRUE, allow the fitted
meanand/or fittedsdto vary among the component Normal distributions.- components
character One of
"all","sum", or"members"select which densities are returned.- n.min
integer Minimum number of distinct values in the mapped variable for fitting to the attempted.
- na.rm
a logical indicating whether NA values should be stripped before the computation proceeds.
- show.legend
logical. Should this layer be included in the legends?
NA, the default, includes if any aesthetics are mapped.FALSEnever includes, andTRUEalways includes.- inherit.aes
If
FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g.borders.
Value
The value returned by the statistic is a data frame, with n
rows of predicted density for each component of the mixture plus their
sum and the corresponding vector of x values. Optionally it will
also include additional values related to the model fit.
Details
This statistic is similar to stat_density but
instead of fitting a single distribution it can fit a mixture of two or
more Normal distributions, using an approach related to clustering.
Defaults are consistent between stat_distrmix_line() and
stat_distrmix_eq(). Parameter fit.seed if not NA is used
in a call to set.seed() immediately before calling the model fit
function. As the fitting procedure makes use of the (pseudo-)random number
generator (RNG), convergence can depend on it, and in such cases setting
fit.seed to the same value in stat_distrmix_line() and in
stat_distrmix_eq() can ensure consistency, and more
generally, reproducibility.
A mixture model as described above, is fitted for k >= 2, while
k == 1 is treated as a special case and a Normal distribution fitted
with function fitdistr(). In this case the SE values
are exact estimates.
Computed variables
stat_distrmix_line() provides the following
variables, some of which depend on the orientation:
- x
the
nvalues for the quantiles- component
A factor indexing the components and/or their sum
If fm.values = TRUE is passed then columns with diagnosis and
parameters estimates are added, with the same value in each row within a
group:
- n
numericthe number ofxvalues- .size
numericthe number ofdensityvalues- fm.class
characterthe most derived class of the fitted model object- fm.method
characterthe method, as given by theftfield of the fitted model objects
This is wasteful and disabled by default, but provides a simple and robust approach to achieve effects like colouring or hiding of the model fit line by group depending on the outcome of model fitting.
See also
Other ggplot statistics for mixture model fits.:
stat_distrmix_eq()
Aesthetics
stat_distrmix_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x or y | |
| • | group | → after_stat(component) |
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Examples
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line()
# ggplot(faithful, aes(y = waiting)) +
# stat_distrmix_line(orientation = "y")
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(components = "sum")
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(components = "members")
ggplot(faithful, aes(x = waiting)) +
geom_histogram(aes(y = after_stat(density)), bins = 20) +
stat_distrmix_line(aes(colour = after_stat(component),
fill = after_stat(component)),
geom = "area", linewidth = 1, alpha = 0.25, se = FALSE)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(aes(colour = after_stat(component),
fill = after_stat(component)),
geom = "area", linewidth = 1, alpha = 0.25,
components = "members", se = FALSE)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(geom = "area", linewidth = 1, alpha = 0.25,
colour = "black", outline.type = "upper",
components = "sum", se = FALSE)
# special case of no mixture
ggplot(subset(faithful, waiting > 66), aes(x = waiting)) +
stat_distrmix_line(k = 1)
#> With k = 1 one Normal distribution is fitted. Irrelevant parameters ignored!
# Inspecting the returned data using geom_debug_group()
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)
if (gginnards.installed)
library(gginnards)
if (gginnards.installed)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(geom = "debug_group", components = "all")
#> [1] "PANEL 1; group(s) comp.1; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 2 35.29541 comp.1 0.0001092383 FALSE 1 comp.1 0.0001092383
#> 5 35.61754 comp.1 0.0001306554 FALSE 1 comp.1 0.0001306554
#> 8 35.93967 comp.1 0.0001558017 FALSE 1 comp.1 0.0001558017
#> 11 36.26179 comp.1 0.0001852294 FALSE 1 comp.1 0.0001852294
#> 14 36.58392 comp.1 0.0002195534 FALSE 1 comp.1 0.0002195534
#> 17 36.90605 comp.1 0.0002594556 FALSE 1 comp.1 0.0002594556
#> orientation
#> 2 x
#> 5 x
#> 8 x
#> 11 x
#> 14 x
#> 17 x
#> [1] "PANEL 1; group(s) comp.2; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 3 35.29541 comp.2 9.599747e-15 FALSE 1 comp.2 9.599747e-15
#> 6 35.61754 comp.2 1.457545e-14 FALSE 1 comp.2 1.457545e-14
#> 9 35.93967 comp.2 2.206355e-14 FALSE 1 comp.2 2.206355e-14
#> 12 36.26179 comp.2 3.329812e-14 FALSE 1 comp.2 3.329812e-14
#> 15 36.58392 comp.2 5.010203e-14 FALSE 1 comp.2 5.010203e-14
#> 18 36.90605 comp.2 7.515916e-14 FALSE 1 comp.2 7.515916e-14
#> orientation
#> 3 x
#> 6 x
#> 9 x
#> 12 x
#> 15 x
#> 18 x
#> [1] "PANEL 1; group(s) comp.sum; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 1 35.29541 comp.sum 0.0001092383 FALSE 1 comp.sum 0.0001092383
#> 4 35.61754 comp.sum 0.0001306554 FALSE 1 comp.sum 0.0001306554
#> 7 35.93967 comp.sum 0.0001558017 FALSE 1 comp.sum 0.0001558017
#> 10 36.26179 comp.sum 0.0001852294 FALSE 1 comp.sum 0.0001852294
#> 13 36.58392 comp.sum 0.0002195534 FALSE 1 comp.sum 0.0002195534
#> 16 36.90605 comp.sum 0.0002594556 FALSE 1 comp.sum 0.0002594556
#> orientation
#> 1 x
#> 4 x
#> 7 x
#> 10 x
#> 13 x
#> 16 x
if (gginnards.installed)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(geom = "debug_group", components = "sum")
#> [1] "PANEL 1; group(s) comp.sum; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 1 35.29551 comp.sum 0.0001092387 FALSE 1 comp.sum 0.0001092387
#> 2 35.61763 comp.sum 0.0001306561 FALSE 1 comp.sum 0.0001306561
#> 3 35.93976 comp.sum 0.0001558028 FALSE 1 comp.sum 0.0001558028
#> 4 36.26189 comp.sum 0.0001852309 FALSE 1 comp.sum 0.0001852309
#> 5 36.58402 comp.sum 0.0002195555 FALSE 1 comp.sum 0.0002195555
#> 6 36.90615 comp.sum 0.0002594584 FALSE 1 comp.sum 0.0002594584
#> orientation
#> 1 x
#> 2 x
#> 3 x
#> 4 x
#> 5 x
#> 6 x
if (gginnards.installed)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(geom = "debug_group", components = "members")
#> [1] "PANEL 1; group(s) comp.1; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 1 35.29540 comp.1 0.0001092383 FALSE 1 comp.1 0.0001092383
#> 3 35.61753 comp.1 0.0001306553 FALSE 1 comp.1 0.0001306553
#> 5 35.93966 comp.1 0.0001558016 FALSE 1 comp.1 0.0001558016
#> 7 36.26179 comp.1 0.0001852292 FALSE 1 comp.1 0.0001852292
#> 9 36.58392 comp.1 0.0002195532 FALSE 1 comp.1 0.0002195532
#> 11 36.90604 comp.1 0.0002594554 FALSE 1 comp.1 0.0002594554
#> orientation
#> 1 x
#> 3 x
#> 5 x
#> 7 x
#> 9 x
#> 11 x
#> [1] "PANEL 1; group(s) comp.2; 'draw_function()' input 'data' (head):"
#> x component density flipped_aes PANEL group y
#> 2 35.29540 comp.2 9.599325e-15 FALSE 1 comp.2 9.599325e-15
#> 4 35.61753 comp.2 1.457482e-14 FALSE 1 comp.2 1.457482e-14
#> 6 35.93966 comp.2 2.206260e-14 FALSE 1 comp.2 2.206260e-14
#> 8 36.26179 comp.2 3.329671e-14 FALSE 1 comp.2 3.329671e-14
#> 10 36.58392 comp.2 5.009993e-14 FALSE 1 comp.2 5.009993e-14
#> 12 36.90604 comp.2 7.515606e-14 FALSE 1 comp.2 7.515606e-14
#> orientation
#> 2 x
#> 4 x
#> 6 x
#> 8 x
#> 10 x
#> 12 x
if (gginnards.installed)
ggplot(faithful, aes(x = waiting)) +
stat_distrmix_line(geom = "debug_group", fm.values = TRUE)
#> [1] "PANEL 1; group(s) comp.1; 'draw_function()' input 'data' (head):"
#> x component density converged n .size fm.class fm.method
#> 2 35.29539 comp.1 0.0001092382 TRUE 272 600 mixEM normalmixEM
#> 5 35.61752 comp.1 0.0001306552 TRUE 272 600 mixEM normalmixEM
#> 8 35.93965 comp.1 0.0001558015 TRUE 272 600 mixEM normalmixEM
#> 11 36.26177 comp.1 0.0001852290 TRUE 272 600 mixEM normalmixEM
#> 14 36.58390 comp.1 0.0002195529 TRUE 272 600 mixEM normalmixEM
#> 17 36.90603 comp.1 0.0002594550 TRUE 272 600 mixEM normalmixEM
#> flipped_aes PANEL group y orientation
#> 2 FALSE 1 comp.1 0.0001092382 x
#> 5 FALSE 1 comp.1 0.0001306552 x
#> 8 FALSE 1 comp.1 0.0001558015 x
#> 11 FALSE 1 comp.1 0.0001852290 x
#> 14 FALSE 1 comp.1 0.0002195529 x
#> 17 FALSE 1 comp.1 0.0002594550 x
#> [1] "PANEL 1; group(s) comp.2; 'draw_function()' input 'data' (head):"
#> x component density converged n .size fm.class fm.method
#> 3 35.29539 comp.2 9.598574e-15 TRUE 272 600 mixEM normalmixEM
#> 6 35.61752 comp.2 1.457369e-14 TRUE 272 600 mixEM normalmixEM
#> 9 35.93965 comp.2 2.206092e-14 TRUE 272 600 mixEM normalmixEM
#> 12 36.26177 comp.2 3.329420e-14 TRUE 272 600 mixEM normalmixEM
#> 15 36.58390 comp.2 5.009620e-14 TRUE 272 600 mixEM normalmixEM
#> 18 36.90603 comp.2 7.515053e-14 TRUE 272 600 mixEM normalmixEM
#> flipped_aes PANEL group y orientation
#> 3 FALSE 1 comp.2 9.598574e-15 x
#> 6 FALSE 1 comp.2 1.457369e-14 x
#> 9 FALSE 1 comp.2 2.206092e-14 x
#> 12 FALSE 1 comp.2 3.329420e-14 x
#> 15 FALSE 1 comp.2 5.009620e-14 x
#> 18 FALSE 1 comp.2 7.515053e-14 x
#> [1] "PANEL 1; group(s) comp.sum; 'draw_function()' input 'data' (head):"
#> x component density converged n .size fm.class fm.method
#> 1 35.29539 comp.sum 0.0001092382 TRUE 272 600 mixEM normalmixEM
#> 4 35.61752 comp.sum 0.0001306552 TRUE 272 600 mixEM normalmixEM
#> 7 35.93965 comp.sum 0.0001558015 TRUE 272 600 mixEM normalmixEM
#> 10 36.26177 comp.sum 0.0001852290 TRUE 272 600 mixEM normalmixEM
#> 13 36.58390 comp.sum 0.0002195529 TRUE 272 600 mixEM normalmixEM
#> 16 36.90603 comp.sum 0.0002594550 TRUE 272 600 mixEM normalmixEM
#> flipped_aes PANEL group y orientation
#> 1 FALSE 1 comp.sum 0.0001092382 x
#> 4 FALSE 1 comp.sum 0.0001306552 x
#> 7 FALSE 1 comp.sum 0.0001558015 x
#> 10 FALSE 1 comp.sum 0.0001852290 x
#> 13 FALSE 1 comp.sum 0.0002195529 x
#> 16 FALSE 1 comp.sum 0.0002594550 x