Condenses a dataset
by aggregating the data to a given (shorter) interval
unit
. aggregate_Datetime()
is opinionated in the sense that it sets
default handlers for each data type of numeric
, character
, logical
, and
factor
. These can be overwritten by the user. Columns that do not fall into
one of these categories need to be handled individually by the user (...
argument) or will be removed during aggregation. If no unit is specified the
data will simply be aggregated to the most common interval
(dominant.epoch
), which is most often not an aggregation but a rounding.)
Usage
aggregate_Datetime(
dataset,
Datetime.colname = Datetime,
unit = "dominant.epoch",
type = c("round", "floor", "ceiling"),
numeric.handler = mean,
character.handler = function(x) names(which.max(table(x, useNA = "ifany"))),
logical.handler = function(x) mean(x) >= 0.5,
factor.handler = function(x) factor(names(which.max(table(x, useNA = "ifany")))),
...
)
Arguments
- dataset
A light logger dataset. Expects a
dataframe
. If not imported by LightLogR, take care to choose a sensible variable for theDatetime.colname
.- Datetime.colname
column name that contains the datetime. Defaults to
"Datetime"
which is automatically correct for data imported with LightLogR. Expects asymbol
. Needs to be part of thedataset
.- unit
Unit of binning. See
lubridate::round_date()
for examples. The default is"dominant.epoch"
, which means everything will be aggregated to the most common interval. This is especially useful for slightly irregular data, but can be computationally expensive."none"
will not aggregate the data at all.- type
One of
"round"
(the default),"ceiling"
or"floor"
. Setting chooses the relevant function from lubridate.- numeric.handler, character.handler, logical.handler, factor.handler
functions that handle the respective data types. The default handlers calculate the
mean
fornumeric
and themode
forcharacter
,factor
andlogical
types.- ...
arguments given over to
dplyr::summarize()
to handle columns that do not fall into one of the categories above.
Value
A tibble
with aggregated Datetime
data. Usually the number of
rows will be smaller than the input dataset
. If the handler arguments
capture all column types, the number of columns will be the same as in the
input dataset
.
Examples
#dominant epoch without aggregation
sample.data.environment %>%
dominant_epoch()
#> # A tibble: 2 × 3
#> Id dominant.epoch group.indices
#> <chr> <Duration> <int>
#> 1 Environment 30s 1
#> 2 Participant 10s 2
#dominant epoch with 5 minute aggregation
sample.data.environment %>%
aggregate_Datetime(unit = "5 mins") %>%
dominant_epoch()
#> # A tibble: 2 × 3
#> Id dominant.epoch group.indices
#> <chr> <Duration> <int>
#> 1 Environment 300s (~5 minutes) 1
#> 2 Participant 300s (~5 minutes) 2
#dominant epoch with 1 day aggregation
sample.data.environment %>%
aggregate_Datetime(unit = "1 day") %>%
dominant_epoch()
#> # A tibble: 2 × 3
#> Id dominant.epoch group.indices
#> <chr> <Duration> <int>
#> 1 Environment 86400s (~1 days) 1
#> 2 Participant 86400s (~1 days) 2