Skip to contents

Function that returns an R object with observations corresponding to spikes replaced by values computed from neighboring pixels. Spikes are values in spectra that are unusually high compared to neighbors. They are usually individual values or very short runs of similar "unusual" values. Spikes caused by cosmic radiation are a frequent problem in Raman spectra. Another source of spikes are "hot pixels" in CCD and diode array detectors.

Usage

despike(x, z.threshold, max.spike.width, window.width, method, na.rm, ...)

# Default S3 method
despike(
  x,
  z.threshold = NA,
  max.spike.width = NA,
  window.width = NA,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'numeric'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'data.frame'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  y.var.name = NULL,
  var.name = y.var.name
)

# S3 method for class 'generic_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  y.var.name = NULL,
  var.name = y.var.name,
  ...
)

# S3 method for class 'source_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  unit.out = getOption("photobiology.radiation.unit", default = "energy"),
  ...
)

# S3 method for class 'response_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  unit.out = getOption("photobiology.radiation.unit", default = "energy"),
  ...
)

# S3 method for class 'filter_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  filter.qty = getOption("photobiology.filter.qty", default = "transmittance"),
  ...
)

# S3 method for class 'reflector_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'solute_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'cps_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'raw_spct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...
)

# S3 method for class 'generic_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  y.var.name = NULL,
  var.name = y.var.name,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'source_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  unit.out = getOption("photobiology.radiation.unit", default = "energy"),
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'response_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  unit.out = getOption("photobiology.radiation.unit", default = "energy"),
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'filter_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  filter.qty = getOption("photobiology.filter.qty", default = "transmittance"),
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'reflector_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'solute_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'cps_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

# S3 method for class 'raw_mspct'
despike(
  x,
  z.threshold = 9,
  max.spike.width = 8,
  window.width = 11,
  method = "run.mean",
  na.rm = FALSE,
  ...,
  .parallel = FALSE,
  .paropts = NULL
)

Arguments

x

an R object

z.threshold

numeric Modified Z values larger than z.threshold are considered to correspond to spikes.

max.spike.width

integer Wider regions with high Z values are not detected as spikes.

window.width

integer. The full width of the window used for the running mean used as replacement.

method

character The name of the method: "run.mean" is running mean as described in Whitaker and Hayes (2018); "adj.mean" is mean of adjacent neighbors (isolated bad pixels only).

na.rm

logical indicating whether NA values should be treated as spikes and replaced.

...

Arguments passed by name to find_spikes().

var.name, y.var.name

character Names of columns where to look for spikes to remove.

unit.out

character One of "energy" or "photon"

filter.qty

character One of "transmittance" or "absorbance"

.parallel

if TRUE, apply function in parallel, using parallel backend provided by foreach

.paropts

a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.

Value

A copy of the object passed as argument to x with values detected as spikes replaced by a local average of adjacent neighbors outside the spike.

Details

Spikes are detected based on a modified Z score calculated from the differenced spectrum. The Z threshold used should be adjusted to the characteristics of the input and desired sensitivity. The lower the threshold the more stringent the test becomes, resulting in most cases in more spikes being detected. A modified version of the algorithm is used if a value different from NULL is passed as argument to max.spike.width. In such a case, an additional step filters out broader spikes (or falsely detected steep slopes) from the returned values.

Simple interpolation replaces values of isolated bad pixels by the mean of their two closest neighbors. The running mean approach allows the replacement of short runs of bad pixels by the running mean of neighboring pixels within a window of user-specified width. The first approach works well for spectra from array spectrometers to correct for hot and dead pixels in an instrument. The second approach is most suitable for Raman spectra in which spikes triggered by radiation are wider than a single pixel but usually not more than five pixels wide.

When the argument passed to x contains multiple spectra, the spikes are searched for and replaced in each spectrum independently of other spectra.

Methods (by class)

  • despike(default): Default returning always NA.

  • despike(numeric): Default function usable on numeric vectors.

  • despike(data.frame): Method for "data.frame" objects.

  • despike(generic_spct): Method for "generic_spct" objects.

  • despike(source_spct): Method for "source_spct" objects.

  • despike(response_spct): Method for "response_spct" objects.

  • despike(filter_spct): Method for "filter_spct" objects.

  • despike(reflector_spct): Method for "reflector_spct" objects.

  • despike(solute_spct): Method for "solute_spct" objects.

  • despike(cps_spct): Method for "cps_spct" objects.

  • despike(raw_spct): Method for "raw_spct" objects.

  • despike(generic_mspct): Method for "generic_mspct" objects.

  • despike(source_mspct): Method for "source_mspct" objects.

  • despike(response_mspct): Method for "cps_mspct" objects.

  • despike(filter_mspct): Method for "filter_mspct" objects.

  • despike(reflector_mspct): Method for "reflector_mspct" objects.

  • despike(solute_mspct): Method for "solute_mspct" objects.

  • despike(cps_mspct): Method for "cps_mspct" objects.

  • despike(raw_mspct): Method for "raw_mspct" objects.

Note

Current algorithm misidentifies steep smooth slopes as spikes, so manual inspection is needed together with adjustment by trial and error of a suitable argument value for z.threshold.

See also

See the documentation for find_spikes and replace_bad_pixs for details of the algorithm and implementation.

Examples


white_led.raw_spct[120:125, ]
#> Object: raw_spct [6 x 4]
#> Wavelength range 244.51-246.88 nm, step 0.47-0.48 nm 
#> Label: led_desk201 
#> Measured on 2016-12-19 16:19:57.298874 UTC 
#> Data acquired with 'MayaPro2000' s.n. MAYP11278
#> grating 'HC1', slit '010s'
#> diffuser 'unknown'
#> integ. time (s): 0.233, 1.43, 5
#> total time (s): 10, 10, 10
#> counts @ peak (% of max): 94.2
#> Variables:
#>  w.length: Wavelength [nm]
#>  counts_1: Raw detector counts [number]
#>  counts_2: Raw detector counts [number]
#>  counts_3: Raw detector counts [number] 
#> --
#> # A tibble: 6 × 4
#>   w.length counts_1 counts_2 counts_3
#>      <dbl>    <dbl>    <dbl>    <dbl>
#> 1     245.    2505.    4055.    8683 
#> 2     245.    2472.    3887.    8058 
#> 3     245.    2501.    4053.    8670.
#> 4     246.    3108.    7771.   22248.
#> 5     246.    2454.    3758.    7668.
#> 6     247.    2478.    3934.    8261 

# find and replace spike at 245.93 nm
despike(white_led.raw_spct,
        z.threshold = 10,
        window.width = 25)[120:125, ]
#> Object: raw_spct [6 x 4]
#> Wavelength range 244.51-246.88 nm, step 0.47-0.48 nm 
#> Label: led_desk201 
#> Measured on 2016-12-19 16:19:57.298874 UTC 
#> Data acquired with 'MayaPro2000' s.n. MAYP11278
#> grating 'HC1', slit '010s'
#> diffuser 'unknown'
#> integ. time (s): 0.233, 1.43, 5
#> total time (s): 10, 10, 10
#> counts @ peak (% of max): 94.2
#> Variables:
#>  w.length: Wavelength [nm]
#>  counts_1: Raw detector counts [number]
#>  counts_2: Raw detector counts [number]
#>  counts_3: Raw detector counts [number] 
#> --
#> # A tibble: 6 × 4
#>   w.length counts_1 counts_2 counts_3
#>      <dbl>    <dbl>    <dbl>    <dbl>
#> 1     245.    2505.    4055.    8683 
#> 2     245.    2472.    3887.    8058 
#> 3     245.    2501.    4053.    8670.
#> 4     246.    2485.    3952.    8301.
#> 5     246.    2486.    3956.    8316.
#> 6     247.    2478.    3934.    8261