Replace running differences smaller than a threshold by zeros in selected columns of data frames.
Usage
denoise_chunks(
data,
time.name = "TIMESTAMP",
qty.name = NULL,
absolute.threshold = 0,
relative.threshold = 0.05,
range.baseline = 0,
add.signs = FALSE,
verbose = FALSE
)Arguments
- data
data.frame or list of
data.frameobjects Each data frame containing at least one column with time stamps, one column with a measured quantity, and one column of running differences for each measured quantity.- time.name
character vector of length one Name of the variable containing time stamps for the observations.
- qty.name
character vector Name(s) of variable(s) in
datacontaining values observed quantities. Ifqty.name = NULL, the default, all columns are retained.- absolute.threshold
numeric The largest difference values to ignore, i.e., to set to zero. Expressed as a change per second.
- relative.threshold
numeric The multiplier to apply to the spread of
xto obtain the largest difference values to ignore, i.e., to set to zero. Expressed as a change per second.- range.baseline
numeric An additional value included in the computation of the range of the observations. Set
range.baseline = NAfor the spread applied torelative.thresholdto be computed only based on the observations, setrange.baseline = 0for the range to include zero, i.e., use a relative threshold relative to the maximum observation.- add.signs
logical Flag indicating if values returned by
sign()on the de-noised differences are to be added to the returned data frame chunks.- verbose
logical Report data columns found. Useful for debugging.
Value
denoise_chunks() returns a copy of data, either a
data.frame or a list. Each dataframe with each column of
differences named in qty.name, if present, replaced by the value
returned by function denoise_diffs() applied to it, and
optionally with columns added with the result of calling
sign() on the denoised differences.
Details
When searching for changes in the sign of differences we may need to discard small values introduced by "measurement noise". These functions replace differences smaller than a threshold by zeros. This approach is an alternative to smoothing, which can be difficult to implement for irregular time series.
The argument passed to data can be either a bare data.frame
object or a list containing one or more data frames, such as that
returned by split_chunks().
The argument passed to absolute.threshold is directly expressed as the
smallest value of differences to be retained with any smaller differences
replaced by zero. In contrast, the argument passed to
relative.threshold is a multiplier applied to the spread of the
observations, where the spread is the difference between the largest and the
smallest observed value for a given variable in data plus
range.baseline. The values of the two thresholds are combined, so that
the largest of the two values is used. Setting either threshold equal to
zero, forces the other the one to be always used. The threshold used is
computed as
max(abs(diff(range(c(range.baseline, x), na.rm = TRUE))) * relative.threshold, absolute.threshold)
with differences in data smaller than the threshold, set to zero.
The intended use of absolute.threshold is to allow filtering out both
zero or dark noise and gain noise in the observed data, i.e., to be able to
apply a minimum denoising even in the complete absence of flecks, but
otherwise apply a denoising relative to the value of the largest observation
or relative to the spread of the observations.
