Function that returns an R object with observations corresponding to spikes replaced by values computed from neighboring pixels. Spikes are values in spectra that are unusually high compared to neighbors. They are usually individual values or very short runs of similar "unusual" values. Spikes caused by cosmic radiation are a frequent problem in Raman spectra. Another source of spikes are "hot pixels" in CCD and diode array detectors.
Usage
despike(x, z.threshold, max.spike.width, window.width, method, na.rm, ...)
# Default S3 method
despike(
x,
z.threshold = NA,
max.spike.width = NA,
window.width = NA,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'numeric'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'data.frame'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
y.var.name = NULL,
var.name = y.var.name
)
# S3 method for class 'generic_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
y.var.name = NULL,
var.name = y.var.name,
...
)
# S3 method for class 'source_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
unit.out = getOption("photobiology.radiation.unit", default = "energy"),
...
)
# S3 method for class 'response_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
unit.out = getOption("photobiology.radiation.unit", default = "energy"),
...
)
# S3 method for class 'filter_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
filter.qty = getOption("photobiology.filter.qty", default = "transmittance"),
...
)
# S3 method for class 'reflector_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'solute_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'cps_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'raw_spct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...
)
# S3 method for class 'generic_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
y.var.name = NULL,
var.name = y.var.name,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'source_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
unit.out = getOption("photobiology.radiation.unit", default = "energy"),
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'response_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
unit.out = getOption("photobiology.radiation.unit", default = "energy"),
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'filter_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
filter.qty = getOption("photobiology.filter.qty", default = "transmittance"),
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'reflector_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'solute_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'cps_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
.parallel = FALSE,
.paropts = NULL
)
# S3 method for class 'raw_mspct'
despike(
x,
z.threshold = 9,
max.spike.width = 8,
window.width = 11,
method = "run.mean",
na.rm = FALSE,
...,
.parallel = FALSE,
.paropts = NULL
)
Arguments
- x
an R object
- z.threshold
numeric Modified Z values larger than
z.threshold
are considered to correspond to spikes.- max.spike.width
integer Wider regions with high Z values are not detected as spikes.
- window.width
integer. The full width of the window used for the running mean used as replacement.
- method
character The name of the method:
"run.mean"
is running mean as described in Whitaker and Hayes (2018);"adj.mean"
is mean of adjacent neighbors (isolated bad pixels only).- na.rm
logical indicating whether
NA
values should be treated as spikes and replaced.- ...
Arguments passed by name to
find_spikes()
.- var.name, y.var.name
character Names of columns where to look for spikes to remove.
- unit.out
character One of "energy" or "photon"
- filter.qty
character One of "transmittance" or "absorbance"
- .parallel
if TRUE, apply function in parallel, using parallel backend provided by foreach
- .paropts
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.
Value
A copy of the object passed as argument to x
with values
detected as spikes replaced by a local average of adjacent neighbors
outside the spike.
Details
Spikes are detected based on a modified Z score calculated from the
differenced spectrum. The Z threshold used should be adjusted to the
characteristics of the input and desired sensitivity. The lower the
threshold the more stringent the test becomes, resulting in most cases in
more spikes being detected. A modified version of the algorithm is used if
a value different from NULL
is passed as argument to
max.spike.width
. In such a case, an additional step filters out
broader spikes (or falsely detected steep slopes) from the returned values.
Simple interpolation replaces values of isolated bad pixels by the mean of their two closest neighbors. The running mean approach allows the replacement of short runs of bad pixels by the running mean of neighboring pixels within a window of user-specified width. The first approach works well for spectra from array spectrometers to correct for hot and dead pixels in an instrument. The second approach is most suitable for Raman spectra in which spikes triggered by radiation are wider than a single pixel but usually not more than five pixels wide.
When the argument passed to x
contains multiple spectra, the spikes
are searched for and replaced in each spectrum independently of other
spectra.
Methods (by class)
despike(default)
: Default returning always NA.despike(numeric)
: Default function usable on numeric vectors.despike(data.frame)
: Method for "data.frame" objects.despike(generic_spct)
: Method for "generic_spct" objects.despike(source_spct)
: Method for "source_spct" objects.despike(response_spct)
: Method for "response_spct" objects.despike(filter_spct)
: Method for "filter_spct" objects.despike(reflector_spct)
: Method for "reflector_spct" objects.despike(solute_spct)
: Method for "solute_spct" objects.despike(cps_spct)
: Method for "cps_spct" objects.despike(raw_spct)
: Method for "raw_spct" objects.despike(generic_mspct)
: Method for "generic_mspct" objects.despike(source_mspct)
: Method for "source_mspct" objects.despike(response_mspct)
: Method for "cps_mspct" objects.despike(filter_mspct)
: Method for "filter_mspct" objects.despike(reflector_mspct)
: Method for "reflector_mspct" objects.despike(solute_mspct)
: Method for "solute_mspct" objects.despike(cps_mspct)
: Method for "cps_mspct" objects.despike(raw_mspct)
: Method for "raw_mspct" objects.
Note
Current algorithm misidentifies steep smooth slopes as spikes, so
manual inspection is needed together with adjustment by trial and error
of a suitable argument value for z.threshold
.
See also
See the documentation for find_spikes
and
replace_bad_pixs
for details of the algorithm and
implementation.
Examples
white_led.raw_spct[120:125, ]
#> Object: raw_spct [6 x 4]
#> Wavelength range 244.51-246.88 nm, step 0.47-0.48 nm
#> Label: led_desk201
#> Measured on 2016-12-19 16:19:57.298874 UTC
#> Data acquired with 'MayaPro2000' s.n. MAYP11278
#> grating 'HC1', slit '010s'
#> diffuser 'unknown'
#> integ. time (s): 0.233, 1.43, 5
#> total time (s): 10, 10, 10
#> counts @ peak (% of max): 94.2
#> Variables:
#> w.length: Wavelength [nm]
#> counts_1: Raw detector counts [number]
#> counts_2: Raw detector counts [number]
#> counts_3: Raw detector counts [number]
#> --
#> # A tibble: 6 × 4
#> w.length counts_1 counts_2 counts_3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 245. 2505. 4055. 8683
#> 2 245. 2472. 3887. 8058
#> 3 245. 2501. 4053. 8670.
#> 4 246. 3108. 7771. 22248.
#> 5 246. 2454. 3758. 7668.
#> 6 247. 2478. 3934. 8261
# find and replace spike at 245.93 nm
despike(white_led.raw_spct,
z.threshold = 10,
window.width = 25)[120:125, ]
#> Object: raw_spct [6 x 4]
#> Wavelength range 244.51-246.88 nm, step 0.47-0.48 nm
#> Label: led_desk201
#> Measured on 2016-12-19 16:19:57.298874 UTC
#> Data acquired with 'MayaPro2000' s.n. MAYP11278
#> grating 'HC1', slit '010s'
#> diffuser 'unknown'
#> integ. time (s): 0.233, 1.43, 5
#> total time (s): 10, 10, 10
#> counts @ peak (% of max): 94.2
#> Variables:
#> w.length: Wavelength [nm]
#> counts_1: Raw detector counts [number]
#> counts_2: Raw detector counts [number]
#> counts_3: Raw detector counts [number]
#> --
#> # A tibble: 6 × 4
#> w.length counts_1 counts_2 counts_3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 245. 2505. 4055. 8683
#> 2 245. 2472. 3887. 8058
#> 3 245. 2501. 4053. 8670.
#> 4 246. 2485. 3952. 8301.
#> 5 246. 2486. 3956. 8316.
#> 6 247. 2478. 3934. 8261