Skip to contents

This simple helper function was created to summarize episodes of gaps, clusters, or states, focusing on numeric variables. It calculates mean values for all numeric columns and handles Duration objects appropriately.

Despite its name, the function actually summarizes all double columns, which is more inclusive compared to just numeric columns.

Usage

summarize_numeric(
  data,
  remove = NULL,
  prefix = "mean_",
  na.rm = TRUE,
  add.total.duration = TRUE,
  durations.dec = 0
)

summarise_numeric(
  data,
  remove = NULL,
  prefix = "mean_",
  na.rm = TRUE,
  add.total.duration = TRUE,
  durations.dec = 0
)

Arguments

data

A dataframe containing numeric data, typically from extract_clusters() or extract_gaps().

remove

Character vector of columns removed from the summary.

prefix

A prefix to add to the column names of summarized metrics. Defaults to "mean_".

na.rm

Whether to remove NA values when calculating means. Defaults to TRUE.

add.total.duration

Logical, whether the total duration for a given group should be calculated. Only relevant if a column duration is part of the input data.

durations.dec

Numeric of number of decimals for the mean calculation of durations and times. Defaults to 0.

Value

A dataframe containing the summarized metrics.

Examples

# Extract clusters and summarize them
dataset <- 
sample.data.environment %>% 
aggregate_Datetime(unit = "15 mins") |> 
extract_clusters(MEDI > 1000)

#input to summarize_numeric
dataset
#> # A tibble: 16 × 6
#> # Groups:   Id [2]
#>    Id     state.count start               end                 epoch             
#>    <fct>  <chr>       <dttm>              <dttm>              <Duration>        
#>  1 Envir… 1           2023-08-29 06:52:30 2023-08-29 19:52:30 900s (~15 minutes)
#>  2 Envir… 2           2023-08-30 06:52:30 2023-08-30 20:07:30 900s (~15 minutes)
#>  3 Envir… 3           2023-08-31 06:52:30 2023-08-31 19:37:30 900s (~15 minutes)
#>  4 Envir… 4           2023-09-01 07:07:30 2023-09-01 19:52:30 900s (~15 minutes)
#>  5 Envir… 5           2023-09-02 06:52:30 2023-09-02 19:52:30 900s (~15 minutes)
#>  6 Envir… 6           2023-09-03 06:52:30 2023-09-03 19:52:30 900s (~15 minutes)
#>  7 Parti… 1           2023-08-29 16:37:30 2023-08-29 17:22:30 900s (~15 minutes)
#>  8 Parti… 2           2023-08-31 10:37:30 2023-08-31 11:22:30 900s (~15 minutes)
#>  9 Parti… 3           2023-09-01 16:07:30 2023-09-01 18:37:30 900s (~15 minutes)
#> 10 Parti… 4           2023-09-02 12:22:30 2023-09-02 19:22:30 900s (~15 minutes)
#> 11 Parti… 5           2023-09-03 10:22:30 2023-09-03 11:07:30 900s (~15 minutes)
#> 12 Parti… 6           2023-09-03 11:37:30 2023-09-03 12:22:30 900s (~15 minutes)
#> 13 Parti… 7           2023-09-03 12:37:30 2023-09-03 13:52:30 900s (~15 minutes)
#> 14 Parti… 8           2023-09-03 14:22:30 2023-09-03 14:52:30 900s (~15 minutes)
#> 15 Parti… 9           2023-09-03 15:07:30 2023-09-03 16:37:30 900s (~15 minutes)
#> 16 Parti… 10          2023-09-03 16:52:30 2023-09-03 18:52:30 900s (~15 minutes)
#> # ℹ 1 more variable: duration <Duration>
#output of summarize_numeric (removing state.count and epoch from the summary)
dataset |> summarize_numeric(c("state.count", "epoch"))
#> # A tibble: 2 × 6
#>   Id      mean_start          mean_end            mean_duration         episodes
#>   <fct>   <dttm>              <dttm>              <Duration>               <int>
#> 1 Enviro… 2023-08-31 18:55:00 2023-09-01 07:52:30 46650s (~12.96 hours)        6
#> 2 Partic… 2023-09-02 11:16:30 2023-09-02 13:03:00 6390s (~1.77 hours)         10
#> # ℹ 1 more variable: total_duration <Duration>