Skip to contents

stat_spikes() tags or extracts rows in data containing local y narrow maxima and/or minima with very steep shoulders. It makes it possible to highlight and label spikes based on their x and/or y coordinates. Orientations flipping as well as dates and times are supported.

Usage

stat_spikes(
  mapping = NULL,
  data = NULL,
  geom = "point",
  position = "identity",
  ...,
  orientation = "x",
  height.threshold = 20,
  z.threshold = 7,
  k = 20,
  spike.direction = "both",
  label.fmt = NULL,
  x.label.fmt = label.fmt,
  y.label.fmt = NULL,
  extract.spikes = NULL,
  na.rm = FALSE,
  show.legend = FALSE,
  inherit.aes = TRUE
)

Arguments

mapping

The aesthetic mapping, usually constructed with aes or aes_. Only needs to be set at the layer level if you are overriding the plot defaults.

data

A layer specific dataset - only needed if you want to override the plot defaults.

geom

The geometric object to use display the data

position

The position adjustment to use for overlapping points on this layer

...

other arguments passed on to layer. This can include aesthetics whose values you want to set, not map. See layer for more details.

orientation

character The orientation of the layer can be set to either "x", the default, or "y".

height.threshold

numeric The minimum height of spikes expressed relative to the median amplitude of the baseline local variation of x.

z.threshold

numeric Modified local \(Z\) values larger than z.threshold are detected as boundaries of spikes.

k

integer width of median window used for smoothing; must be odd

spike.direction

character One of "up", "down", "both" or "skip", indicating which spikes are to be returned, if any.

label.fmt, x.label.fmt, y.label.fmt

character strings giving a format definition for construction of character strings labels with function sprintf from x and/or y values.

extract.spikes

If TRUE only the rows containing spikes are returned. If FALSE the whole of data is returned but with labels set to "" in rows not containing spikes. If NULL, the default, TRUE, is used unless the argument passed to geom is "text_repel", "label_repel" or "marquee_repel".

na.rm

a logical value indicating whether NA values should be stripped before the computation proceeds.

show.legend

logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes.

inherit.aes

If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

Value

A data frame with one row for each spike found in the data extracted from the input data or all rows in data. Added columns contain the labels.

Details

Spikes are detected based on a modified \(Z\) score calculated from the differenced spectrum. The \(Z\) threshold used should be adjusted to the characteristics of the input and desired sensitivity. The lower the threshold the more stringent the test becomes, with shorter spikes being detected.

The algorithm uses running differences to detect abrupt changes in value, compared to an estimate of the baseline variation of the differences, approximating a baseline \(Z\) from MAD and a baseline value from the median differences. Currently, a single estimate of MAD is used but running medians, when posisble, as baseline. This comparison detects running differences that are unusually large, in most cases signalling a transition between values near the baseline and far from it, in both directions.

Transitions into- and out of spikes are distinguished based on the median of the non-differenced values, as a descriptor of the data baseline. As for the median of the differences, a running median is used when possible.

This function thus detects the start and end of each spike, and distinguishes upward and downward spikes.

k is the width in number of observations of the window used for running median smoothing to extract the baseline. A value several times the width of the broader spike but narrow enough to track broader peaks needs to be manually set in most cases.

With na.rm = TRUE, NA values are omitted before searching for spikes and set to 0L in the returned vector.

If all spikes are guaranteed to be one observation-wide and either going up or down from the baseline, it is possible to detect them based purely on the z.threshold by passing height.threshold = NA and either spike.direction = "up" or spike.direction = "down", which ensures very fast computation.

Computed and copied variables in the returned data frame

x

x-values at the spikes as numeric.

y

y-values at the spikes as numeric.

x.label

x-values at the spikes formatted as character.

y.label

y-values at the spikes formatted as character.

is.spike

integer vector of 0, 1 or -1.

Label positioning and formatting

stat_peaks(), stat_valleys() and stat_spikes() work nicely together with geoms geom_text_repel(), geom_label_repel(), and geom_marquee_repel() from package ggrepel to solve the problem of overlapping labels by displacing them. If using geom_text(), discard overlapping labels using check_overlap = TRUE.

By default the labels are character values ready to be ploted as plain text, but with a suitable label.fmt argument, labels formatted as plotmath expressions, markdown or LaTeX can be created (e.g., containing Greek letters or super or subscripts, maths or colour) can be generated for use with geoms from packages 'marquee', 'ggtext' and 'xdvir'.

The default is geom = "point" it is likely to work well in almost any situation. The default aesthetics mappings set by these stats allow their direct use with geom_text(), geom_label(), geom_line(), geom_rug(), geom_hline() and geom_vline() by just passing an argument to geom.

References

Whitaker, D. A.; Hayes, K. (2018) A simple algorithm for despiking Raman spectra. Chemometrics and Intelligent Laboratory Systems, 179, 82-84. doi:10.1016/j.chemolab.2018.06.009 .

See also

find_spikes, for the function used to located the spikes.

Aesthetics

stat_spikes() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:

x
y
group→ inferred
labelafter_stat(x.label)
xinterceptafter_stat(x)
yinterceptafter_stat(y)

Learn more about setting these aesthetics in vignette("ggplot2-specs").

Examples

# lynx and Nile are time.series objects recognized by
# ggpp::ggplot.ts() and converted on-the-fly with a default mapping

n = 500
set.seed(45678)
my.data <- data.frame(x = 1:n,
                      y = rep(sin((0:19)/20 * 2 * pi), n / 20) +
                          stats::rnorm(n, sd = 0.5))
selector <- sample(seq_len(n), 5)
my.data$y[selector] <- my.data$y[selector] + 10

ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(colour = "orange")


ggplot(my.data, aes(x, -y)) +
  geom_line() +
  stat_spikes(colour = "orange")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(geom = "text", vjust = -0.5) +
  stat_spikes(geom = "rug", colour = "red")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(colour = "red", spike.direction = "up") +
  stat_spikes(colour = "blue", spike.direction = "down")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(colour = "red", spike.direction = "up")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(colour = "blue", spike.direction = "down")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(z.threshold = 2, colour = "orange")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(z.threshold = 20, colour = "orange")


ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(colour = "red",
              spike.direction = "up",
              height.threshold = NA)


# Inspecting the returned data using geom_debug_group()
# This provides a quick way of finding out the names of the variables that
# are available for mapping to aesthetics with after_stat().

gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)

if (gginnards.installed)
  library(gginnards)

if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(geom = "debug_group")

#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#>     x         y PANEL group is.spike flipped_aes x.label y.label label
#> 1 123  9.991314     1    -1        1       FALSE     123   9.991   123
#> 2 310  9.952880     1    -1        1       FALSE     310   9.953   310
#> 3 366 11.078975     1    -1        1       FALSE     366   11.08   366
#> 4 430 10.679878     1    -1        1       FALSE     430   10.68   430
#> 5 469 10.854067     1    -1        1       FALSE     469   10.85   469
#>   xintercept yintercept orientation
#> 1        123   9.991314           x
#> 2        310   9.952880           x
#> 3        366  11.078975           x
#> 4        430  10.679878           x
#> 5        469  10.854067           x

if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
  geom_line() +
  stat_spikes(geom = "debug_group", extract.spikes = FALSE)

#> [1] "PANEL 1; group(s) -1; 'draw_function()' input 'data' (head):"
#>   x           y PANEL group is.spike flipped_aes x.label y.label label
#> 1 1  0.72024002     1    -1        0       FALSE                      
#> 2 2 -0.05077105     1    -1        0       FALSE                      
#> 3 3  0.92327446     1    -1        0       FALSE                      
#> 4 4  0.59804276     1    -1        0       FALSE                      
#> 5 5  1.14016632     1    -1        0       FALSE                      
#> 6 6  0.16670681     1    -1        0       FALSE                      
#>   xintercept  yintercept orientation
#> 1          1  0.72024002           x
#> 2          2 -0.05077105           x
#> 3          3  0.92327446           x
#> 4          4  0.59804276           x
#> 5          5  1.14016632           x
#> 6          6  0.16670681           x