Split a time series stored in a data frame at breaks (long time steps), returning a list of data frames or data chunks.
Usage
split_chunks(
data,
time.name = "TIMESTAMP",
qty.name = NULL,
time.step = NULL,
chunk.min.time,
chunk.min.rows = 2,
add.diffs = TRUE,
verbose = FALSE,
na.rm = TRUE
)Arguments
- data
data.frame Containing at least one coloumn with time stamps and one column with a measured quantity.
- time.name
character vector of length one Name of the variable containing time stamps for the observations.
- qty.name
character vector Name(s) of variable(s) in
datacontaining values observed quantities. Ifqty.name = NULL, the default, all columns are retained.- time.step
numeric The duration in seconds of one time step within a chunk. If
NULL, the actual time steps are used.- chunk.min.time
numeric or duration Length of minimum time step length between data chunks. If numeric, expressed in seconds.
- chunk.min.rows
integer The minimum number of rows that a chunk must have not to be discarded.
- add.diffs
logical Flag indicating if values returned by
diff()are to be added to the returned data frame chunks.- verbose
logical Report chunk names and lengths at each iteration. Useful for debugging.
- na.rm
logical Omit rows of
datacontainingNAvalues after selecting variables.
Value
A list of data frames of varying length, depending on the number of
chunks found, possibly of length zero. The members of the list are named
based on the starting time of each chunk. The variables included in the
member data frames are those named by time.name and qty.name
and optionally, their running differences.
Details
When time series of data are acquired in bursts or chunks separated by longer time intervals it can be useful to extract the chunks into separate data frames before further analysis. This implementation does not assume the same duration for all chunks or the gaps, it searches for time intervals longer than a threshold duration and splits the data at these points. If the data contains no gaps, the whole data is returned as a single chunk.
When a minimum length for the individuals chunks is set with an argument to
chunk.min.rows, chunks with fewer rows are discarded silently,
unless verbose = TRUE.
With add.diffs = TRUE the running differences between values in the
current row and the one above are added to the returned data frames. The
value in the first row is NA for running differences, except for
the time, in which case it is the time difference to the precceeding value
in data.
Method diff() must be available for the class of the variable
named by the argument to time.name. The class of this column is in
most cases numeric, date, or time. If add.diffs = TRUE this
requirement also applies to the variable(s) named by the argument passed to
qty.name.
The number of chunks in the returned list of data frames and their lengths
are reported in a message().
