This simple helper function was created to summarize episodes of gaps, clusters, or states, focusing on numeric variables. It calculates mean values for all numeric columns and handles Duration objects appropriately.
Despite its name, the function actually summarizes all double columns, which is more inclusive compared to just numeric columns.
Usage
summarize_numeric(
data,
remove = NULL,
prefix = "mean_",
na.rm = TRUE,
add.total.duration = TRUE,
durations.dec = 0
)
summarise_numeric(
data,
remove = NULL,
prefix = "mean_",
na.rm = TRUE,
add.total.duration = TRUE,
durations.dec = 0
)
Arguments
- data
A dataframe containing numeric data, typically from
extract_clusters()
orextract_gaps()
.- remove
Character vector of columns removed from the summary.
- prefix
A prefix to add to the column names of summarized metrics. Defaults to "mean_".
- na.rm
Whether to remove NA values when calculating means. Defaults to TRUE.
- add.total.duration
Logical, whether the total duration for a given group should be calculated. Only relevant if a column
duration
is part of the input data.- durations.dec
Numeric of number of decimals for the mean calculation of durations and times. Defaults to 0.
Examples
# Extract clusters and summarize them
dataset <-
sample.data.environment %>%
aggregate_Datetime(unit = "15 mins") |>
extract_clusters(MEDI > 1000)
#input to summarize_numeric
dataset
#> # A tibble: 16 × 6
#> # Groups: Id [2]
#> Id state.count start end epoch
#> <fct> <chr> <dttm> <dttm> <Duration>
#> 1 Envir… 1 2023-08-29 06:52:30 2023-08-29 19:52:30 900s (~15 minutes)
#> 2 Envir… 2 2023-08-30 06:52:30 2023-08-30 20:07:30 900s (~15 minutes)
#> 3 Envir… 3 2023-08-31 06:52:30 2023-08-31 19:37:30 900s (~15 minutes)
#> 4 Envir… 4 2023-09-01 07:07:30 2023-09-01 19:52:30 900s (~15 minutes)
#> 5 Envir… 5 2023-09-02 06:52:30 2023-09-02 19:52:30 900s (~15 minutes)
#> 6 Envir… 6 2023-09-03 06:52:30 2023-09-03 19:52:30 900s (~15 minutes)
#> 7 Parti… 1 2023-08-29 16:37:30 2023-08-29 17:22:30 900s (~15 minutes)
#> 8 Parti… 2 2023-08-31 10:37:30 2023-08-31 11:22:30 900s (~15 minutes)
#> 9 Parti… 3 2023-09-01 16:07:30 2023-09-01 18:37:30 900s (~15 minutes)
#> 10 Parti… 4 2023-09-02 12:22:30 2023-09-02 19:22:30 900s (~15 minutes)
#> 11 Parti… 5 2023-09-03 10:22:30 2023-09-03 11:07:30 900s (~15 minutes)
#> 12 Parti… 6 2023-09-03 11:37:30 2023-09-03 12:22:30 900s (~15 minutes)
#> 13 Parti… 7 2023-09-03 12:37:30 2023-09-03 13:52:30 900s (~15 minutes)
#> 14 Parti… 8 2023-09-03 14:22:30 2023-09-03 14:52:30 900s (~15 minutes)
#> 15 Parti… 9 2023-09-03 15:07:30 2023-09-03 16:37:30 900s (~15 minutes)
#> 16 Parti… 10 2023-09-03 16:52:30 2023-09-03 18:52:30 900s (~15 minutes)
#> # ℹ 1 more variable: duration <Duration>
#output of summarize_numeric (removing state.count and epoch from the summary)
dataset |> summarize_numeric(c("state.count", "epoch"))
#> # A tibble: 2 × 6
#> Id mean_start mean_end mean_duration episodes
#> <fct> <dttm> <dttm> <Duration> <int>
#> 1 Enviro… 2023-08-31 18:55:00 2023-09-01 07:52:30 46650s (~12.96 hours) 6
#> 2 Partic… 2023-09-02 11:16:30 2023-09-02 13:03:00 6390s (~1.77 hours) 10
#> # ℹ 1 more variable: total_duration <Duration>