Skip to contents

Replace running differences smaller than a threshold by zeros in selected columns of data frames.

Usage

denoise_chunks(
  data,
  time.name = "TIMESTAMP",
  qty.name = NULL,
  absolute.threshold = 0,
  relative.threshold = 0.05,
  range.baseline = 0,
  add.signs = FALSE,
  verbose = FALSE
)

Arguments

data

data.frame or list of data.frame objects Each data frame containing at least one column with time stamps, one column with a measured quantity, and one column of running differences for each measured quantity.

time.name

character vector of length one Name of the variable containing time stamps for the observations.

qty.name

character vector Name(s) of variable(s) in data containing values observed quantities. If qty.name = NULL, the default, all columns are retained.

absolute.threshold

numeric The largest difference values to ignore, i.e., to set to zero. Expressed as a change per second.

relative.threshold

numeric The multiplier to apply to the spread of x to obtain the largest difference values to ignore, i.e., to set to zero. Expressed as a change per second.

range.baseline

numeric An additional value included in the computation of the range of the observations. Set range.baseline = NA for the spread applied to relative.threshold to be computed only based on the observations, set range.baseline = 0 for the range to include zero, i.e., use a relative threshold relative to the maximum observation.

add.signs

logical Flag indicating if values returned by sign() on the de-noised differences are to be added to the returned data frame chunks.

verbose

logical Report data columns found. Useful for debugging.

Value

denoise_chunks() returns a copy of data, either a data.frame or a list. Each dataframe with each column of differences named in qty.name, if present, replaced by the value returned by function denoise_diffs() applied to it, and optionally with columns added with the result of calling sign() on the denoised differences.

Details

When searching for changes in the sign of differences we may need to discard small values introduced by "measurement noise". These functions replace differences smaller than a threshold by zeros. This approach is an alternative to smoothing, which can be difficult to implement for irregular time series.

The argument passed to data can be either a bare data.frame object or a list containing one or more data frames, such as that returned by split_chunks().

The argument passed to absolute.threshold is directly expressed as the smallest value of differences to be retained with any smaller differences replaced by zero. In contrast, the argument passed to relative.threshold is a multiplier applied to the spread of the observations, where the spread is the difference between the largest and the smallest observed value for a given variable in data plus range.baseline. The values of the two thresholds are combined, so that the largest of the two values is used. Setting either threshold equal to zero, forces the other the one to be always used. The threshold used is computed as

max(abs(diff(range(c(range.baseline, x), na.rm = TRUE))) * relative.threshold, absolute.threshold)

with differences in data smaller than the threshold, set to zero.

The intended use of absolute.threshold is to allow filtering out both zero or dark noise and gain noise in the observed data, i.e., to be able to apply a minimum denoising even in the complete absence of flecks, but otherwise apply a denoising relative to the value of the largest observation or relative to the spread of the observations.