Filtering a dataset based on Dates or Datetimes may often be necessary prior
to calcuation or visualization. The functions allow for a filtering based on
simple strings or Datetime scalars, or by specifying a length. They also
support prior dplyr grouping, which is useful, e.g., when you only want to
filter the first two days of measurement data for every participant,
regardless of the actual date. If you want to filter based on times of the
day, look to filter_Time().
Usage
filter_Datetime(
dataset,
Datetime.colname = Datetime,
start = NULL,
end = NULL,
length = NULL,
length_from_start = TRUE,
full.day = FALSE,
tz = NULL,
only_Id = NULL,
filter.expr = NULL
)
filter_Date(..., start = NULL, end = NULL)Arguments
- dataset
A light logger dataset. Expects a
dataframe. If not imported by LightLogR, take care to choose a sensible variable for theDatetime.colname.- Datetime.colname
column name that contains the datetime. Defaults to
"Datetime"which is automatically correct for data imported with LightLogR. Expects asymbol. Needs to be part of thedataset. Must be of typePOSIXct.- start, end
For
filter_Datetime()aPOSIXctorcharacterscalar in the form of"yyyy-mm-dd hh-mm-ss"giving the respective start and end time positions for the filtered dataframe. If you only want to providedatesin the form of"yyyy-mm-dd", use the wrapper functionfilter_Date().If one or both of start/end are not provided, the times will be taken from the respective extreme values of the
dataset.If
lengthis provided and one of start/end is not, the other will be calculated based on the given value.If
lengthis provided and both of start/end are NULL, the time from the respective start is taken.
- length
Either a Period or Duration from lubridate. E.g.,
days(2) + hours(12)will give a period of 2.5 days, whereasddays(2) + dhours(12)will give a duration. For the difference between periods and durations look at the documentation from lubridate. Basically, periods model clocktimes, whereas durations model physical processes. This matters on several occasions, like leap years, or daylight savings. You can also provide acharacterscalar in the form of e.g. "1 day", which will be converted into a period.- length_from_start
A
logicalindicating whether thelengthargument should be applied to the start (default, TRUE) or the end of the data (FALSE). Only relevant if neither thestartnor theendarguments are provided.- full.day
A
logicalindicating whether thestartparam should be rounded to a full day, when only thelengthargument is provided (Default is FALSE). This is useful, e.g., when the first observation in the dataset is slightly after midnight. If TRUE, it will count the length from midnight on to avoid empty days in plotting withgg_day().- tz
Timezone of the start/end times. If
NULL(the default), it will take the timezone from theDatetime.colnamecolumn.- only_Id
An expression of
idswhere the filtering should be applied to. IfNULL(the default), the filtering will be applied to allids. Based on the this expression, the dataset will be split in two and only where the given expression evaluates toTRUE, will the filtering take place. Afterwards both sets are recombined and sorted byDatetime.- filter.expr
Advanced filtering conditions. If not
NULL(default) and given anexpression, this is used todplyr::filter()the results. This can be useful to filter, e.g. for group-specific conditions, like starting after the first two days of measurement (see examples).- ...
Parameter handed over to
lubridate::round_date()and siblings
See also
Other filter:
filter_Time()
Other filter:
filter_Time()
Examples
library(lubridate)
library(dplyr)
#baseline
range.unfiltered <- sample.data.environment$Datetime %>% range()
range.unfiltered
#> [1] "2023-08-29 00:00:04 CEST" "2023-09-03 23:59:54 CEST"
#setting the start of a dataset
sample.data.environment %>%
filter_Datetime(start = "2023-08-31 12:00:00") %>%
pull(Datetime) %>%
range()
#> [1] "2023-08-31 12:00:04 CEST" "2023-09-03 23:59:44 CEST"
#setting the end of a dataset
sample.data.environment %>%
filter_Datetime(end = "2023-08-31 12:00:00") %>% pull(Datetime) %>% range()
#> [1] "2023-08-29 00:00:04 CEST" "2023-08-31 11:59:54 CEST"
#setting a period of a dataset
sample.data.environment %>%
filter_Datetime(end = "2023-08-31 12:00:00", length = days(2)) %>%
pull(Datetime) %>% range()
#> [1] "2023-08-29 12:00:04 CEST" "2023-08-31 11:59:54 CEST"
#setting only the period of a dataset
sample.data.environment %>%
filter_Datetime(length = days(2)) %>%
pull(Datetime) %>% range()
#> [1] "2023-08-29 00:00:04 CEST" "2023-08-30 23:59:54 CEST"
#advanced filtering based on grouping (second day of each group)
sample.data.environment %>%
#shift the "Environment" group by one day
mutate(
Datetime = ifelse(Id == "Environment", Datetime + ddays(1), Datetime) %>%
as_datetime()) -> sample
sample %>% summarize(Daterange = paste(min(Datetime), max(Datetime), sep = " - "))
#> # A tibble: 2 × 2
#> Id Daterange
#> <fct> <chr>
#> 1 Environment 2023-08-29 22:00:08 - 2023-09-04 21:59:38
#> 2 Participant 2023-08-28 22:00:04 - 2023-09-03 21:59:54
#now we can use the `filter.expr` argument to filter from the second day of each group
sample %>%
filter_Datetime(filter.expr = Datetime > Datetime[1] + days(1)) %>%
summarize(Daterange = paste(min(Datetime), max(Datetime), sep = " - "))
#> # A tibble: 2 × 2
#> Id Daterange
#> <fct> <chr>
#> 1 Environment 2023-08-30 22:00:38 - 2023-09-04 21:59:08
#> 2 Participant 2023-08-29 22:00:14 - 2023-09-03 21:59:54
sample.data.environment %>% filter_Date(end = "2023-08-31")
#> # A tibble: 34,560 × 3
#> # Groups: Id [2]
#> Id Datetime MEDI
#> <fct> <dttm> <dbl>
#> 1 Participant 2023-08-29 00:00:04 0
#> 2 Participant 2023-08-29 00:00:14 0
#> 3 Participant 2023-08-29 00:00:24 0
#> 4 Participant 2023-08-29 00:00:34 0
#> 5 Participant 2023-08-29 00:00:44 0
#> 6 Participant 2023-08-29 00:00:54 0
#> 7 Participant 2023-08-29 00:01:04 0
#> 8 Participant 2023-08-29 00:01:14 0
#> 9 Participant 2023-08-29 00:01:24 0
#> 10 Participant 2023-08-29 00:01:34 0
#> # ℹ 34,550 more rows
