Filtering a dataset based on Dates or Datetimes may often be necessary prior
to calcuation or visualization. The functions allow for a filtering based on
simple strings
or Datetime
scalars, or by specifying a length. They also
support prior dplyr grouping, which is useful, e.g., when you only want to
filter the first two days of measurement data for every participant,
regardless of the actual date. If you want to filter based on times of the
day, look to filter_Time()
.
Usage
filter_Datetime(
dataset,
Datetime.colname = Datetime,
start = NULL,
end = NULL,
length = NULL,
length_from_start = TRUE,
full.day = FALSE,
tz = NULL,
only_Id = NULL,
filter.expr = NULL
)
filter_Date(..., start = NULL, end = NULL)
Arguments
- dataset
A light logger dataset. Expects a
dataframe
. If not imported by LightLogR, take care to choose a sensible variable for theDatetime.colname
.- Datetime.colname
column name that contains the datetime. Defaults to
"Datetime"
which is automatically correct for data imported with LightLogR. Expects asymbol
. Needs to be part of thedataset
.- start, end
For
filter_Datetime()
aPOSIXct
orcharacter
scalar in the form of"yyyy-mm-dd hh-mm-ss"
giving the respective start and end time positions for the filtered dataframe. If you only want to providedates
in the form of"yyyy-mm-dd"
, use the wrapper functionfilter_Date()
.If one or both of start/end are not provided, the times will be taken from the respective extreme values of the
dataset
.If
length
is provided and one of start/end is not, the other will be calculated based on the given value.If
length
is provided and both of start/end are NULL, the time from the respective start is taken.
- length
Either a Period or Duration from lubridate. E.g.,
days(2) + hours(12)
will give a period of 2.5 days, whereasddays(2) + dhours(12)
will give a duration. For the difference between periods and durations look at the documentation from lubridate. Basically, periods model clocktimes, whereas durations model physical processes. This matters on several occasions, like leap years, or daylight savings. You can also provide acharacter
scalar in the form of e.g. "1 day", which will be converted into a period.- length_from_start
A
logical
indicating whether thelength
argument should be applied to the start (default, TRUE) or the end of the data (FALSE). Only relevant if neither thestart
nor theend
arguments are provided.- full.day
A
logical
indicating whether thestart
param should be rounded to a full day, when only thelength
argument is provided (Default is FALSE). This is useful, e.g., when the first observation in the dataset is slightly after midnight. If TRUE, it will count the length from midnight on to avoid empty days in plotting withgg_day()
.- tz
Timezone of the start/end times. If
NULL
(the default), it will take the timezone from theDatetime.colname
column.- only_Id
An expression of
ids
where the filtering should be applied to. IfNULL
(the default), the filtering will be applied to allids
. Based on the this expression, the dataset will be split in two and only where the given expression evaluates toTRUE
, will the filtering take place. Afterwards both sets are recombined and sorted byDatetime
.- filter.expr
Advanced filtering conditions. If not
NULL
(default) and given anexpression
, this is used todplyr::filter()
the results. This can be useful to filter, e.g. for group-specific conditions, like starting after the first two days of measurement (see examples).- ...
Parameter handed over to
lubridate::round_date()
and siblings
See also
Other filter:
filter_Time()
Other filter:
filter_Time()
Examples
library(lubridate)
library(dplyr)
#baseline
range.unfiltered <- sample.data.environment$Datetime %>% range()
range.unfiltered
#> [1] "2023-08-15 00:00:01 UTC" "2023-08-20 23:59:51 UTC"
#setting the start of a dataset
sample.data.environment %>%
filter_Datetime(start = "2023-08-18 12:00:00") %>%
pull(Datetime) %>%
range()
#> [1] "2023-08-18 12:00:01 UTC" "2023-08-20 23:59:41 UTC"
#setting the end of a dataset
sample.data.environment %>%
filter_Datetime(end = "2023-08-18 12:00:00") %>% pull(Datetime) %>% range()
#> [1] "2023-08-15 00:00:01 UTC" "2023-08-18 11:59:51 UTC"
#setting a period of a dataset
sample.data.environment %>%
filter_Datetime(end = "2023-08-18 12:00:00", length = days(2)) %>%
pull(Datetime) %>% range()
#> [1] "2023-08-16 12:00:01 UTC" "2023-08-18 11:59:51 UTC"
#setting only the period of a dataset
sample.data.environment %>%
filter_Datetime(length = days(2)) %>%
pull(Datetime) %>% range()
#> [1] "2023-08-15 00:00:01 UTC" "2023-08-16 23:59:51 UTC"
#advanced filtering based on grouping (second day of each group)
sample.data.environment %>%
#shift the "Environment" group by one day
mutate(
Datetime = ifelse(Id == "Environment", Datetime + ddays(1), Datetime) %>%
as_datetime()) -> sample
sample %>% summarize(Daterange = paste(min(Datetime), max(Datetime), sep = " - "))
#> # A tibble: 2 × 2
#> Id Daterange
#> <chr> <chr>
#> 1 Environment 2023-08-16 00:00:02 - 2023-08-21 23:59:32
#> 2 Participant 2023-08-15 00:00:01 - 2023-08-20 23:59:51
#now we can use the `filter.expr` argument to filter from the second day of each group
sample %>%
filter_Datetime(filter.expr = Datetime > Datetime[1] + days(1)) %>%
summarize(Daterange = paste(min(Datetime), max(Datetime), sep = " - "))
#> # A tibble: 2 × 2
#> Id Daterange
#> <chr> <chr>
#> 1 Environment 2023-08-17 00:00:32 - 2023-08-21 23:59:02
#> 2 Participant 2023-08-16 00:00:11 - 2023-08-20 23:59:51
sample.data.environment %>% filter_Date(end = "2023-08-17")
#> # A tibble: 34,560 × 3
#> # Groups: Id [2]
#> Datetime MEDI Id
#> <dttm> <dbl> <chr>
#> 1 2023-08-15 00:00:01 0 Participant
#> 2 2023-08-15 00:00:11 0 Participant
#> 3 2023-08-15 00:00:21 0 Participant
#> 4 2023-08-15 00:00:31 0 Participant
#> 5 2023-08-15 00:00:41 0 Participant
#> 6 2023-08-15 00:00:51 0 Participant
#> 7 2023-08-15 00:01:01 0 Participant
#> 8 2023-08-15 00:01:11 0 Participant
#> 9 2023-08-15 00:01:21 0 Participant
#> 10 2023-08-15 00:01:31 0 Participant
#> # ℹ 34,550 more rows