Skip to contents

This article focuses on two important aspects of light logger analysis: structuring data into relevant groups and calculating personal light exposure metrics for them. LightLogR contains a large set of over 60 metrics and sub-metrics across multiple functions, where each constitutes a family of light exposure metrics. The following packages are needed for the analysis:

Please note that this article uses the base pipe operator |>. You need an R version equal to or greater than 4.1.0 to use it. If you are using an older version, you can replace it with the magrittr pipe operator %>%.

Importing Data

We will use data imported and cleaned already in the article Import & Cleaning.

#this assumes the data is in the cleaned_data folder in the working directory
data <- readRDS("cleaned_data/ll_data.rds")

As can be seen by using gg_overview(), the dataset contains 17 ids with one weeks worth of data each, and one to three participants per week.

data |> gg_overview()

Metric principles

There are a lot of metrics associated with personal light exposure. You can find the function reference to all of them in the appropriate reference section. There are a few important distinctions between metrics that are important to understand:

  • Some metrics require or work best with a specific time frame, usually one day, while others are calculated over an arbitrary length of time. For example, the function interdaily_stability() calculates a metric over multiple days, while a function like midpointCE() calculates the midpoint of the cumulative light exposure within the given time series - this is less useful for multiple days, where the midpoint is just a time point during these days. E.g., for two similar light exposure patterns across two days, the midpoint of cumulative light exposure across those two days will be around midnight, which is not particularly informative. Much more sensible is the midpoint of the light exposure for each day. To enable this, data has to be grouped within days (or other relevant time frames, like sleep/wake-phase).

  • Some metrics are submetrics within a family and have to be actively chosen through the arguments of the function. An example is duration_above_threshold() that, despite its name also provides the metrics duration below threshold and duration within threshold. Depending on its comparison argument, and whether one or two thresholds are provided, the function will calculate different metrics.

  • Some metric functions calculate multiple submetrics at once, like bright_dark_period(). As stated above, this type of function contains metrics accessible through a function argument, period in this case, which allows to specify whether the brightest or darkest periods of the day are required. Independent of this, the function will calculate multiple submetrics at once, which are the onset, midpoint, and offset of the respective period, and also the mean light level during that period.

We will cover the practical considerations following from these aspects in the following sections. Further, every function documentation explicitly states whether different metrics are accessible through parameters, and which metrics are calculated by default.

Note: Most metrics require complete and regular data for a sensible output. While some metrics can handle missing data, it is generally advisable to clean the data before calculating metrics. LightLogR helps to identify gaps and irregularities and can also aggregate data to larger intervals, which can be acceptable for small gaps. In cases of larger gaps, dates or participants might have to be removed from analysis.

To log or not to log (transform)

Light exposure data (e.g., Illuminance, or melanopic EDI) is not normally distributed (see #. By their nature, their values are often highly skewed, and also overdispersed. Additionally, the data tend to show an excess of zero values (so called zero-inflation). The paper How to deal with darkness: Modelling and visualization of zero-inflated personal light exposure data on a logarithmic scale by Zauner et al. (2025) explores ways to deal with this.

For simplicity, this article will just use the untransformed melanopic EDI values to teach the basics on how metric functions work in LightLogR. However, we generally recommend to use log_zero_inflated() whenever there are means calculated in light exposure, which is a simple way to deal with zero-values. See the article Log transformation for more information on this. The function log_zero_inflated() is used to log-transform the data, while exp_zero_inflated() is used to back-transform the data.

Metric calculation: basics

All metric functions are by default agnostic to the type of data. They require vectors of numeric data (e.g., light data) and commonly also of datetimes. This means that the functions can be used outside of the LightLogR framework, if applied correctly. Let us try this with a simple example for a days worth of light data for one participant across two functions.

data_Id201 <- 
  data |> 
    filter(Id == 201 & date(Datetime) == "2023-08-15")

data_Id201 |> 
  gg_day()

Time above threshold (TAT)

The first example metric we will calculate is the time above threshold (or TAT) for a threshold of 250 lx mel EDI. TAT is calculated by the function duration_above_threshold().

duration_above_threshold(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  threshold = 250
)
#> [1] "34500s (~9.58 hours)"

Specifying the argument comparison = "below" will calculate the time below the threshold.

duration_above_threshold(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  threshold = 250,
  comparison = "below"
)
#> [1] "51900s (~14.42 hours)"

And specifying two thresholds will calculate the time within the thresholds.

duration_above_threshold(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  threshold = c(10,250)
)
#> [1] "15320s (~4.26 hours)"

Brightest 10 hours of the day (L10)

The second example metric yields multiple submetrics at once. The function bright_dark_period() calculates the brightest and darkest periods of the day. By default, it calculates the brightest 10 hour period of the day. By setting as_df = TRUE, the function will return a data frame we can pipe to gt() for a better output

bright_dark_period(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  as.df = TRUE
) |> 
  gt() |> tab_header("M10")
M10
brightest_10h_mean brightest_10h_midpoint brightest_10h_onset brightest_10h_offset
2506.202 2023-08-15 13:42:01 2023-08-15 08:42:11 2023-08-15 18:42:01

Looping

Calculating the darkest period of the day is tricky, as it likely traverses midnight. In the following code we can see that the darkest 10-hour period of day begins at midnight and ends at 10 am, which would be very coincidental. (Note that commonly, the darkest 5-hour period is calculated. We deviate from this to make this point.)

M10_wrong <- 
bright_dark_period(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  as.df = TRUE,
  period = "darkest",
  timespan = "10 hours"
)

M10_wrong |> gt() |> tab_header("M10 without looping")
M10 without looping
darkest_10h_mean darkest_10h_midpoint darkest_10h_onset darkest_10h_offset
305.2523 2023-08-15 04:59:51 2023-08-15 00:00:01 2023-08-15 09:59:51

We also see that this makes little sense, if we visualize this portion. The blue color indicates the darkest 10-hour period of the day.

data_Id201 |> 
  mutate(State = ifelse(
    Datetime >= M10_wrong$darkest_10h_onset & 
      Datetime <= M10_wrong$darkest_10h_offset, "M10", NA
  )) |>
  gg_day() |> 
  gg_state(State, aes_fill = State) +
  guides(fill = "none")

To solve this, bright_dark_period() and some other functions have the option to loop the day.

M10 <- 
bright_dark_period(
  Light.vector = data_Id201$MEDI,
  Time.vector = data_Id201$Datetime,
  as.df = TRUE,
  period = "darkest",
  timespan = "10 hours",
  loop = TRUE
)

M10 |> gt()
darkest_10h_mean darkest_10h_midpoint darkest_10h_onset darkest_10h_offset
1.423622 2023-08-15 01:36:51 2023-08-15 20:37:01 2023-08-15 06:36:51

This is more plausible, and can also be visualized easily.

data_Id201 |> 
  mutate(State = ifelse(
    Datetime >= M10$darkest_10h_onset | 
      Datetime <= M10$darkest_10h_offset, "M10", NA
  )) |>
  gg_day() |> 
  gg_state(State, aes_fill = State) +
  guides(fill = "none")

Metric calculation: advanced

More often than not, metrics are calculated for many participants over prolonged periods of time. In this case, the singular calculation as shown above is inefficient. The dplyr family of dplyr::summarize() and dplyr::reframe() make this much easier.

Be sure to have the data prepared in a way that metric functions can be applied correctly. This is the responsibility of the user, as many functions will provide an output, as long as the input vectors are of the correct type and length. In our case we already prepared the data correctly in the Import & Cleaning article. The data is already grouped by Id, and has no gaps or irregular data

Summarize

The dplyr::summarize() function is used to calculate metrics for each group of data. In the following example, we will calculate Interdaily Stability (IS) for all participants in the data set, giving us the variability of the 24h light exposure patterns across the full 6 days of data compared to their average, ranging between 0 (Gaussian noise) and 1 (Perfect stability). For brevity, only the first 6 Ids will be shown.

data |> 
  summarize(
    IS = interdaily_stability(
      Light.vector = MEDI,
      Datetime.vector = Datetime
    )
  ) |> 
  head() |> 
  gt() 
Id IS
201 0.5079676
202 0.1986105
204 0.2663354
205 0.3215539
206 0.2367288
208 0.1937092

Grouping

By default, data imported with LightLogR is grouped by Id, which represents individual participants. When using the dplyr family of functions, grouping is essential, as it specifies the subgroups of data for which the metrics are calculated. In the following example, we will calculate the TAT 250 lx MEDI for all participants in the data set. We only show the first 6 participants, as it becomes readily apparent that time above threshold for 6 days might not be the most informative parametrization of the metric.

data |> 
  summarize(
    TAT_250 = duration_above_threshold(
      Light.vector = MEDI,
      Time.vector = Datetime,
      threshold = 250
    )
  ) |> 
  head() |> 
  gt()
Id TAT_250
201 160180s (~1.85 days)
202 29970s (~8.32 hours)
204 147340s (~1.71 days)
205 98520s (~1.14 days)
206 6320s (~1.76 hours)
208 47140s (~13.09 hours)

Instead, we can calculate the TAT 250 lx MEDI for each participant and day of data. This is more informative, as it allows us to see how the metric changes over time. The final output is for the first two Ids.

#create a new column in the data set with the weekday
data$wDay <- wday(data$Datetime, label = TRUE, week_start = 1)

#group the data and calculate the metrics
TAT_250 <- 
data |> 
  group_by(wDay, .add = TRUE) |> 
  summarize(
    TAT_250 = duration_above_threshold(
      Light.vector = MEDI,
      Time.vector = Datetime,
      threshold = 250
    ), .groups = "drop_last"
  )

TAT_250 |> 
  head(12) |> 
  gt()
wDay TAT_250
201
Tue 34500s (~9.58 hours)
Wed 32780s (~9.11 hours)
Thu 21820s (~6.06 hours)
Fri 31670s (~8.8 hours)
Sat 15010s (~4.17 hours)
Sun 24400s (~6.78 hours)
202
Tue 18760s (~5.21 hours)
Wed 6930s (~1.93 hours)
Thu 200s (~3.33 minutes)
Fri 200s (~3.33 minutes)
Sat 3130s (~52.17 minutes)
Sun 750s (~12.5 minutes)

Photoperiod

Another useful grouping factor is photoperiod, to differentiate the day into day and night. LightLogR contains a family of functions to easily deal with photoperiod. Here is a minimal example.

#specifying coordinates (latitude/longitude)
coordinates <- c(48.521637, 9.057645)

#adding photoperiod information
data <- 
  data |> 
  add_photoperiod(coordinates)

#calculating the metric
mean_Exposure <- 
data |> 
  group_by(photoperiod.state, .add = TRUE) |> 
  summarize(
    mean_MEDI = mean(MEDI), .groups = "drop_last"
  )

#showing the first three participants
mean_Exposure |> 
  head(6) |> 
  gt() |> 
  fmt_number(mean_MEDI)
photoperiod.state mean_MEDI
201
day 962.52
night 0.70
202
day 317.06
night 2.28
204
day 2,194.02
night 6.94

Same as above, we can summarize the data further:

mean_Exposure |> 
  group_by(photoperiod.state) |> 
  summarize_numeric(prefix = ""
  ) |> 
  gt() |> 
  fmt_number(mean_MEDI)
photoperiod.state mean_MEDI episodes
day 727.29 17
night 6.61 17

This easily gives us metrics based on daily photoperiod. Metric calculation can utilize photoperiod information in other ways, too. More information on dealing with photoperiods can be found in the article Photoperiod.

Metric statistics

With the dataframe TAT_250, we can easily calculate statistics for each participant. This can be done manually, e.g., with another call to dplyr::summarize(), or semi-automatic, e.g., with packages like gtsummary. In the following example, we will calculate the mean and standard deviation of the TAT 250 lx MEDI for each participant, formatted as HH:MM through a styling function.

#styling formula for time
style_time <- function(x, format = "%H:%M"){
  x |> 
    as.numeric() |>  
    hms::as_hms() |> 
    as.POSIXlt() |> 
    format(format)
}

#Table output
TAT_250 |> 
  tbl_summary(by = Id, include = -wDay, 
              statistic = list(TAT_250 ~ "{mean} ({sd})"), 
              digits = list(TAT_250 ~ style_time),
              label = list(TAT_250 = "Time above 250 lx mel EDI")
              )
Characteristic 201
N = 6
1
202
N = 6
1
204
N = 6
1
205
N = 6
1
206
N = 6
1
208
N = 6
1
209
N = 6
1
210
N = 6
1
212
N = 6
1
213
N = 6
1
214
N = 6
1
215
N = 6
1
216
N = 6
1
218
N = 6
1
219
N = 6
1
221
N = 6
1
222
N = 6
1
Time above 250 lx mel EDI 07:24 (02:06) 01:23 (02:00) 06:49 (01:17) 04:33 (02:20) 00:17 (00:24) 02:10 (01:06) 04:07 (02:30) 08:39 (01:16) 04:50 (01:27) 02:33 (00:52) 04:38 (01:51) 02:06 (01:17) 04:07 (00:57) 01:26 (01:17) 02:45 (00:51) 01:00 (00:57) 00:33 (00:43)
1 Mean (SD)

mean_daily()

The function mean_daily() is a helper to summarize daily data further. It takes summary results that either contain a date or a weekday column and calculates the mean of the metric for weekdays, weekends, and, the mean day (based on (5 x weekdays + 2 x weekends) / 7).

#mean daily calculation
TAT_250_daily <-
mean_daily(
  TAT_250,
  Weekend.type = wDay
  )

TAT_250_daily |> 
  head(6) |> 
  gt()
wDay average_TAT_250
201
Mean daily 27196s (~7.55 hours)
Weekday 30192s (~8.39 hours)
Weekend 19705s (~5.47 hours)
202
Mean daily 5213s (~1.45 hours)
Weekday 6522s (~1.81 hours)
Weekend 1940s (~32.33 minutes)

There is a variant of mean_daily() called mean_daily_metric(), which is a convenience function to combine the calculation of a single-return-value, duration-based metric with the mean daily calculation. We can use it to calculate TAT250 from scratch

data |> 
  mean_daily_metric(
    metric = "TAT250",
    Variable = MEDI,
    threshold = 250
  ) |> 
  head() |> 
  gt()
Day average_TAT250
201
Mean daily 27196s (~7.55 hours)
Weekday 30192s (~8.39 hours)
Weekend 19705s (~5.47 hours)
202
Mean daily 5213s (~1.45 hours)
Weekday 6522s (~1.81 hours)
Weekend 1940s (~32.33 minutes)

The function has limited options to change the metric_type. In this case, we change the function to the longest continuous period above threshold.

data |> 
  mean_daily_metric(
    metric = "PAT250",
    Variable = MEDI,
    metric_type = period_above_threshold,
    threshold = 250
  ) |> 
  head() |> 
  gt()
Day average_PAT250
201
Mean daily 6313s (~1.75 hours)
Weekday 6898s (~1.92 hours)
Weekend 4850s (~1.35 hours)
202
Mean daily 1224s (~20.4 minutes)
Weekday 1440s (~24 minutes)
Weekend 685s (~11.42 minutes)

summarize_numeric()/summarise_numeric()

We can even summarize the data further with summarize_numeric(), which takes a dataset and calculates the average of numeric columns, as well as the number of episodes in the group. This makes no sense within participants, where it would just average Weekday, Weekend, and Mean daily. If we regroup the data, however, we can gain usefull insights.

TAT_250_daily |> 
  group_by(wDay) |> 
  summarize_numeric(
  ) |> 
  gt() |> 
  fmt_duration(mean_average_TAT_250, 
               input_units = "seconds", duration_style = "colon-sep")
wDay mean_average_TAT_250 episodes
Mean daily 03:32:10 17
Weekday 03:45:37 17
Weekend 02:58:35 17

We can see that our participants have slightly more time above 250 lx on weekdays, compared to weekends (03:45 vs. 02:58, respectively)

Metric calculation: batch

In the final section, we will add more metrics to the analysis, including ones with multiple sub-metrics. Further, we imagine we want to know how these metrics change from the first half of the experiment (August/September) to the second half (October/November). Finally, we will include a column Time.data in the data set, which will be used to calculate the metrics. This column format excludes the day information from the Datetime column, which avoids date-related issues when calculating the mean of the metrics. Finally, the unnest() call is used to flatten the table from the dataframe substructure that is created by MLIT250 and TAT250.

data <- data |> 
  mutate(
    Month = case_when(month(Datetime) %in% 8:9 ~ "Aug/Sep",
                      month(Datetime) %in% 10:11 ~ "Oct/Nov")
  ) |> 
  create_Timedata()

metrics <- 
  data |> 
  group_by(Month, Id, wDay) |> 
  summarize(
    MLIT250 = 
      timing_above_threshold(MEDI, Time.data, threshold = 250, as.df = TRUE),
    TAT250 = 
      duration_above_threshold(MEDI, Time.data, threshold = 250, as.df = TRUE),
    average_MEDI = 
      MEDI |> log_zero_inflated() |>  mean() |> exp_zero_inflated(), #calculate zero inflated log transformed mean
    light_exposure = 
      sum(MEDI)/360, # 10 second epochs means 360 epochs in one hour. dividing by 360 gives the light exposure in lx·h
    .groups = "drop_last"
    ) |> 
  unnest(-Id)

#first 6 rows
metrics |> 
  head() |> 
  gt()
wDay mean_timing_above_250 first_timing_above_250 last_timing_above_250 duration_above_250 average_MEDI light_exposure
Aug/Sep - 201
Tue 13:55:49 07:48:01 19:43:41 34500s (~9.58 hours) 23.910527 26273.415
Wed 12:53:04 07:03:01 19:46:41 32780s (~9.11 hours) 14.897098 18545.278
Thu 14:25:57 08:41:11 19:27:11 21820s (~6.06 hours) 6.804520 6315.771
Fri 13:12:42 07:14:41 18:51:21 31670s (~8.8 hours) 11.143937 19902.681
Sat 11:29:02 07:08:41 20:40:41 15010s (~4.17 hours) 6.613548 13428.829
Sun 12:46:28 07:23:01 19:12:21 24400s (~6.78 hours) 14.730145 4399.761

The operation above yields a data frame with six metrics across 102 participant days (6 days for 17 participants). The grouping for Month did not add additional groups, as each participant day is already solely in the "Aug/Sep" or "Oct/Nov" group.

Summarize metrics

We can summarize the data different ways.

Within LightLogR, we can use the mean_daily() and summarize_numeric() functions:

#calculating weekday, weekend, and mean daily summaries for each group
metrics |> 
   #calculate weekday, weekend, and mean daily summaries for each group:
  mean_daily(wDay, prefix = "", filter.empty = TRUE) |> #remove empty rows
  group_by(Month, wDay) |>  #regroup so that we can summarize across Participants
  summarize_numeric(prefix = "") |> 
  gt() |> 
  fmt_number(c(average_MEDI, light_exposure))
wDay mean_timing_above_250 first_timing_above_250 last_timing_above_250 duration_above_250 average_MEDI light_exposure episodes
Aug/Sep
Mean daily 13:37:13 08:46:17 19:10:18 15681s (~4.36 hours) 8.67 12,678.75 11
Weekday 13:31:57 08:34:01 19:03:32 16798s (~4.67 hours) 9.04 12,158.59 11
Weekend 13:50:24 09:16:57 19:27:13 12890s (~3.58 hours) 7.75 13,979.15 11
Oct/Nov
Mean daily 14:05:01 10:14:29 18:24:06 7250s (~2.01 hours) 5.49 5,558.11 6
Weekday 13:50:26 10:02:25 18:32:38 7558s (~2.1 hours) 5.28 5,660.88 6
Weekend 14:13:56 10:30:34 17:57:48 7273s (~2.02 hours) 6.07 5,741.14 7

The number of episodes shows us that there were 11 values in Aug/Sep, and 6 in Oct/Nov - except for Weekends, which have 7. This is because one participant crosses Sep/Oct. Checking this participants data shows that only the last day is in October, which is a Sunday - thus being part of the weekend.

data |> 
  filter(Id == 214) |> 
  pull(Datetime) |> 
  date() |> 
  unique()
#> [1] "2023-09-26" "2023-09-27" "2023-09-28" "2023-09-29" "2023-09-30"
#> [6] "2023-10-01"

We can wrangle the same data differently, to get averages across all days

#calculating weekday daily summaries for each group
metrics |> 
  summarize_numeric(prefix = "") |> #summarize across participants
  summarize_numeric(prefix = "") |> #summarize by month
  gt() |> 
  fmt_number(c(average_MEDI, light_exposure))
Month mean_timing_above_250 first_timing_above_250 last_timing_above_250 duration_above_250 average_MEDI light_exposure episodes
Aug/Sep 13:38:46 08:48:41 19:09:20 15625s (~4.34 hours) 8.66 12,781.74 11
Oct/Nov 13:44:46 10:06:26 18:14:53 7888s (~2.19 hours) 5.64 5,946.69 7

This shows us 11 participants for the Aug/Sep timeframe and 7 for the Oct/Nov timeframe. This is in line with the summary above, which showed us one participant crossed the timeframes. Of course, filtering the data at some inbetween step to ensure there are a minimum amount of data points in each category makes sense, which we have left out here for brevity.

Using gtsummary

This section repeats the summary, but by using the popular gtsummary package.

metrics <- 
  metrics |> 
  group_by(Month) |> 
  select(-Id, -wDay)

#Table output
metrics |> 
  tbl_summary(by = Month,
              statistic = list(all_continuous() ~ "{mean} (±{sd})"),
              digits = list(
                c(
                  mean_timing_above_250, first_timing_above_250, 
                  last_timing_above_250, duration_above_250
                  ) ~ style_time),
              label = list(
                mean_timing_above_250 = 
                  "mean timing above 250 lx mel EDI (HH:MM)",
                first_timing_above_250 = 
                  "first time above 250 lx mel EDI (HH:MM)",
                last_timing_above_250 = 
                  "last time above 250 lx mel EDI (HH:MM)",
                duration_above_250 = "duration above 250 lx mel EDI (HH:MM)",
                average_MEDI = "average mel EDI (lx)",
                light_exposure = "light exposure (lx·h)"
                )
              )
Characteristic Aug/Sep
N = 65
1
Oct/Nov
N = 37
1
mean timing above 250 lx mel EDI (HH:MM) 13:39 (±01:43) 14:03 (±01:41)
first time above 250 lx mel EDI (HH:MM) 08:48 (±02:26) 10:14 (±02:20)
last time above 250 lx mel EDI (HH:MM) 19:09 (±02:20) 18:21 (±02:26)
duration above 250 lx mel EDI (HH:MM) 04:19 (±02:57) 02:02 (±01:32)
average mel EDI (lx) 8.7 (±6.6) 5.5 (±5.0)
light exposure (lx·h) 12,846 (±12,122) 5,618 (±6,612)
1 Mean (±SD)

And that is all you need to work with metrics in LightLogR. Be sure to look at the documentation for each function to understand the parameters and outputs and at the reference section to get an overview of all available metrics.