---
title: "Research Question 1"
subtitle: "Do sites differ in objectively measured personal light exposure? If so, does an individual’s behaviour and micro-environment have a stronger effect than site?"
author: "Johannes Zauner"
format:
html:
toc: true
code-tools: true
code-link: true
page-layout: full
code-fold: show
---
## Preface
This document contains the analysis for Research Question 1, as defined in the [preregistration]( https://aspredicted.org/te3zw2.pdf).
> **Research Question 1**: \
Do sites differ in objectively measured personal light exposure? If so, does an individual’s behaviour and micro-environment have a stronger effect than site?
>
> *H1*: There is a significant difference in personal light-exposure metrics across sites, even after accounting for latitude and photoperiod.
>
> *H2*: The variance of personal light exposure patterns (hourly geometric mean of melanopic EDI) by participants within sites is greater than the variance between sites.
## Setup
In this section, we load in the [preprocessed](data_preparation.qmd) and [prepared](metric_preparation.qmd) data, site data, and load required packages.
```{r}
#| message: false
#| warning: false
#| code-summary: Code cell - Setup
library(tidyverse)
library(melidosData)
library(LightLogR)
library(mgcv)
library(lme4)
library(performance)
library(cowplot)
library(ggridges)
library(broom)
library(itsadug)
library(gratia)
library(patchwork)
library(rlang)
library(broom.mixed)
library(gt)
library(performance)
library(glmmTMB)
library(gghighlight)
library(gtsummary)
source("scripts/fitting.R")
source("scripts/summaries.R")
source("scripts/H1_specific.R")
```
```{r}
#| message: false
#| code-summary: Code cell - Load data
load("data/metrics_glasses.RData")
metrics <- metrics_glasses
H1 <- "Hypothesis: There is a significant difference in personal light-exposure metrics across sites, even after accounting for latitude and photoperiod."
H2 <- "Hypothesis: The variance of personal light exposure patterns (hourly geometric mean of melanopic EDI) by participants within sites is greater than the variance between sites."
sig.level <- 0.05
#for chest:
# load("data/metrics_chest.RData")
# metrics <- metrics_chest
# melidos_cities <- melidos_cities[names(melidos_cities)!="MPI"]
```
## Hypothesis 1
### Details
Hypothesis 1 states:
> There is a significant difference in personal light-exposure metrics across sites, even after accounting for latitude and photoperiod.
*Type of model:* LMM
*Base formula:* $\text{Metric} \sim \text{Site} + \text{Photoperiod} + (1|\text{Site}:\text{Participant})$
*Additional notes:*
- For dynamics-based metrics LM without Participant random effect.
- Because latitude and site are each individual values per site (i.e., each site has its unique latitude), they have to be modelled separately and will be compared via AIC.
*Specific formulas:*
- $\text{Metric} \sim \text{Site} + \text{Photoperiod} $ (for dynamics-based metrics)
- $\text{Metric} \sim \text{Latitude} + \text{Photoperiod} + (1|\text{Site}:\text{Participant})$ (for the latitude version of the base formula)
- $\text{Metric} \sim \text{Photoperiod} + (1|\text{Site}) + (1|\text{Site}:\text{Participant})$ (to check whether site is better represented as a random effect)
*False-discovery-rate correction:*
17 (number of metrics that are compared)
### Preparation
```{r}
#| code-summary: Code cell - Model setup based on distribution
model_tbl <- tibble::tribble(
~name, ~engine, ~ response, ~family,
"interdaily_stability", "lm", "qlogis(metric)", gaussian(),
"intradaily_variability", "lm", "metric", gaussian(),
"Mean", "lmer", "log_zero_inflated(metric)", gaussian(),
"dose", "lmer", "log_zero_inflated(metric)", gaussian(),
"MDER", "lmer", "metric", gaussian(),
"darkest_10h_mean", "lmer", "log_zero_inflated(metric)", gaussian(),
"brightest_10h_mean", "lmer", "log_zero_inflated(metric)", gaussian(),
"duration_above_1000", "glmmTMB", "metric", tweedie(link = "log"),
"duration_above_250", "lmer", NA, gaussian(),
"duration_below_10", "lmer", NA, gaussian(),
"duration_below_1", "lmer", NA, gaussian(),
"duration_above_250_wake", "glmmTMB", "metric", tweedie(link = "log"),
"duration_below_10_pre-sleep", "glmmTMB", "metric", tweedie(link = "log"),
"duration_below_1_sleep", "glmmTMB", "metric", tweedie(link = "log"),
"period_above_250", "lmer", "log_zero_inflated(metric)", gaussian(),
"brightest_10h_midpoint", "lmer", "metric", gaussian(),
"brightest_10h_onset", "lmer", NA, gaussian(),
"brightest_10h_offset", "lmer", NA, gaussian(),
"darkest_10h_midpoint", "lmer", "metric", gaussian(),
"darkest_10h_onset", "lmer", NA, gaussian(),
"darkest_10h_offset", "lmer", NA, gaussian(),
"mean_timing_above_250", "lmer", "metric", gaussian(),
"first_timing_above_250", "lmer", "metric", gaussian(),
"last_timing_above_250", "lmer", "metric", gaussian()
)
```
```{r}
#| code-summary: Code cell - Formula setup for inference
metrics <-
metrics |>
mutate(H1_1 = case_when(type == "participant" ~
"~ site + photoperiod",
type == "participant-day" ~
"~ site + photoperiod + (1| site:Id)"),
H1_0 = case_when(type == "participant" ~
"~ photoperiod",
type == "participant-day" ~
"~ photoperiod + (1| site:Id)"),
H1_lat = case_when(type == "participant" ~
"~ lat + photoperiod",
type == "participant-day" ~
"~ lat + photoperiod + (1|site:Id)"),
H1_phot = case_when(type == "participant" ~
"~ site",
type == "participant-day" ~
"~ site + (1|site:Id)"),
H1_rand = case_when(type == "participant" ~
NA,
type == "participant-day" ~
"~ photoperiod + (1|site/Id)")
) |>
left_join(model_tbl, by = "name")
H1_formulas <- names(metrics) |> subset(grepl("H1_", names(metrics)))
metrics <-
metrics |> relocate(all_of(H1_formulas), .after = family)
H1_fdr_n <- metrics |> filter(!is.na(response)) |> nrow()
```
```{r}
#| echo: false
metrics |>
select(-data) |>
filter(!is.na(response)) |>
arrange(metric_type, name) |>
mutate(family = map(family, 1),
rowname = seq_along(1:length(name))) |>
group_by(metric_type) |>
gt() |>
cols_label_with(fn = \(x) x |> str_to_title() |> str_replace_all("_", " ")) |>
sub_missing(missing_text = "") |>
sub_values(values = "NULL", replacement = "") |>
fmt(name, fns = \(x) x |> str_to_title() |> str_replace_all("_", " ")) |>
sub_values(values = "MDER", replacement = "MDER") |>
tab_header("Hypothesis 1: metric model specifications",
subtitle = H1) |>
tab_footnote("Confirmatory analysis will use a p-value correction through the false-discovery rate for n=17 comparisons") |>
tab_footnote("These are the models used for the confirmatory analysis. Each H1 1 and H1 0 pair are compaired with likelihood-ratio tests to determine whether site is meaningful", locations = cells_column_labels(c(H1_1, H1_0))) |>
tab_footnote("These models will be checked via AIC against H1 1 to see whether there is indeed a site difference beyond the specific difference in the model. Also, this model is compared against the H1 0 model", locations = cells_column_labels(c("H1_lat", "H1_rand", "H1_phot")))
```
```{r}
#| code-summary: Code cell - Formula creation
metrics <-
metrics |>
mutate(across(all_of(H1_formulas),
\(x) map2(x, response, \(y,z) make_formula(z, y))
)
)
```
```{r}
#| code-summary: Code cell - Offset zero-values in mixed gamma models
metrics <-
metrics |>
mutate(data = map2(data, engine, \(data, engine) {
if(engine == "glmer") {
data |> mutate(metric = metric + 0.1)
} else data
}))
```
### Analysis
#### Fit models
```{r}
#| code-summary: Code cell - Model fit
#| message: false
metrics <-
metrics |>
mutate(
across(all_of(H1_formulas),
\(x) pmap(list(data, x, engine, family), fit_model),
.names = "{.col}_model")
)
```
#### Main model diagnostics
```{r}
#| code-summary: Code cell - Create diagnostic plots for main model
#| message: false
metrics <-
metrics |>
mutate(
across(all_of(paste0(H1_formulas[1], "_model")),
\(x) map(x, diagnostics),
.names = "{str_remove(.col, '_model')}_diagnostics")
)
```
Model diagnostics overall look good to OK. Anomalies are found but deemed non-critical. Alternative error families, like the `Tweedie` distribution have been used as well. For `Tweedie` models, the homogeneity of variance does not show in the plots (duration metrics), but was checked interactively.
::: {.panel-tabset}
```{r, results='asis', fig.width = 8, fig.height = 12}
walk2(metrics$name, metrics$H1_1_diagnostics, function(name, plot) {
# skip NULL plots
if (is.null(plot)) return()
# format tab title
tab_title <- name |>
str_to_title() |>
str_replace_all("_", " ")
# emit markdown for tab
cat("\n\n##### ", tab_title, "\n\n", sep = "")
# print the plot
print(plot)
})
```
:::
### Model comparisons
```{r}
#| code-summary: Code cell - Model comparisons
#| message: false
metrics <-
metrics |>
mutate(
H1_1_comp = map2(H1_0_model, H1_1_model, model_anova),
H1_1_p = map2(H1_1_comp, engine, model_comp_p, n = H1_fdr_n) |> list_c(),
H1_1_sig = H1_1_p <= sig.level,
H1_lat_comp = map2(H1_0_model, H1_lat_model, model_anova),
H1_lat_p = map2(H1_lat_comp, engine, model_comp_p, n = H1_fdr_n) |> list_c(),
H1_lat_sig = H1_lat_p <= sig.level,
H1_phot_comp = map2(H1_phot_model, H1_1_model, model_anova),
H1_phot_p = map2(H1_phot_comp, engine, model_comp_p, n = H1_fdr_n) |> list_c(),
H1_phot_sig = H1_phot_p <= sig.level,
H1_rand_comp = map2(H1_0_model, H1_rand_model, model_anova),
H1_rand_p = map2(H1_rand_comp, engine, model_comp_p, n = H1_fdr_n) |> list_c(),
H1_rand_sig = H1_rand_p <= sig.level,
AICs = pmap(list(H1_0_model, H1_1_model, H1_phot_model, H1_lat_model,
H1_rand_model), model_AIC),
summary = pmap(list(H1_1_sig, H1_0_model, H1_1_model), model_summary),
summary_lat = pmap(list(H1_lat_sig, H1_0_model, H1_lat_model), model_summary),
summary_rand = pmap(list(H1_rand_sig, H1_0_model, H1_rand_model),
model_summary),
table = pmap(list(name, H1_1_p, summary, metric_type,
response, family), model_table),
table_lat = pmap(list(name, H1_lat_p, summary_lat, metric_type,
response, family), model_table),
table_lat = map(table_lat, \(x) if(is.null(x)) return() else x |>
mutate(across(any_of("lat"), \(x) x*10))),
table = map2(table, H1_phot_sig, \(x,y) if(is.null(x)) return() else x |>
mutate(dAIC_photoperiod = ifelse(y, 0, -3))),
table_rand = pmap(list(name, H1_rand_p, summary_rand, metric_type,
response, family),model_table),
table = pmap(list(table, table_lat, AICs, "lat", "H1_1", "H1_lat"),
H1_combining_tables),
table = pmap(list(table, table_rand, AICs, "SD Site", "H1_1", "H1_rand"),
H1_combining_tables),
)
```
### R^2
```{r}
#| warning: false
#| code-summary: Code cell - calculate R^2
R2_table <-
metrics |>
filter_out(is.na(response)) |>
mutate(
name = name,
metric_type = metric_type,
H1_1_r2 = pmap(list(H1_1_model, data, family, response), r2_helper),
H1_0_r2 = pmap(list(H1_0_model, data, family, response), r2_helper),
H1_phot_r2 = pmap(list(H1_phot_model, data, family, response), r2_helper),
H1_lat_r2 = pmap(list(H1_lat_model, data, family, response), r2_helper),
H1_phot_sig = H1_phot_sig,
H1_lat_sig = H1_lat_sig,
H1_1_sig = H1_1_sig,
.keep = "none"
)
```
```{r}
#| code-summary: Code cell - choose relevant R^2
R2_table <-
R2_table |>
rowwise() |>
mutate(
name = name,
metric_type = metric_type,
marginal_r2 = ifelse(H1_1_sig, H1_1_r2$R2_marginal, H1_0_r2$R2_marginal),
conditional_r2 = ifelse(H1_1_sig, H1_0_r2$R2_conditional, H1_0_r2$R2_conditional),
site_r2 = ifelse(H1_1_sig, H1_1_r2$R2_marginal - H1_0_r2$R2_marginal, NA),
phot_r2 = ifelse(H1_phot_sig, H1_1_r2$R2_marginal - H1_phot_r2$R2_marginal, NA),
rand_r2 = ifelse(H1_1_sig, H1_1_r2$R2_conditional - H1_1_r2$R2_marginal, #
H1_0_r2$R2_conditional - H1_0_r2$R2_marginal),
rand_r2 = ifelse(rand_r2 == 0, NA, rand_r2),
lat_r2 = ifelse(H1_lat_sig, H1_lat_r2$R2_marginal - H1_0_r2$R2_marginal, NA),
.keep = "none"
) |>
mutate(across(c(marginal_r2, conditional_r2),
\(x) ifelse(metric_type == "dynamics", NA, x)
),
phot_r2 = ifelse(is.na(site_r2) & !is.na(conditional_r2), marginal_r2, phot_r2)
)
```
### Results
```{r}
#| message: false
#| warning: false
#| code-summary: Code cell - Table H1
#| label: tbl-H1
#| tab-name: Hypothesis 1 results
H1_tab <- H1_table(metrics, H1_fdr_n) |> cols_width(Plot ~ px(200))
H1_tab
gtsave(H1_tab, "tables/H1.png", vwidth = 1700)
save(metrics, file = "data/H1_results.RData")
```
### Interpretation
@tbl-H1 shows a comprehensive overview of the main inference results. When site is significant, site coefficients are with reference to the `BAUA` site.
As way of a high-level summary:
- Timing-based metrics show a significant site-specific effect that is neither dependent on photoperiod, nor latitude
- While level-based metrics show a significant site-specific effect, it can be drawn back to photoperiod and latitude for daytime metrics (but not for nighttime)
- Duration-based metrics are photoperiod-dependent, but not site specific, same as exposure history and spectrum - these metrics are rather more dependent on inter- and intraindividual differences of participants
- In general, interindividual differences highly relevant and trump site specific factors, although usually not as large as intraindividual differences
- None of the above means that individual sites can not be significantly different from the group or another site. But rather that the variance across all nine sites is or is not sufficiently large.
##### Timing and level-based metrics prove site specific
- Both timing and level-based metrics show site specificity, even when controlled for photoperiod
- The `First timing above 250 lx melanopic EDI` (timing) is the only exception, with neither site nor photoperiod-specific influence
- Neither duration, dynamics, exposure history, nor spectrum-based metrics show site specificity
###### Level-based
- For the overall daily `Mean`, slight increases over `BAUA` (100%) are `THUAS` (+5%), `TUM` (+6%), `FUSPCEU`(+13%), `RISE` (+14%), and `UCR`(+25%). Quite higher are `IZTECH` (+47%) and `MPI`(+62%). Quite darker is `KNUST` (-46%)
- For the `Brightest 10h mean`, slight increases over `BAUA` (100%) are `UCR`(+7%), `THUAS` (+14%) and `IZTECH`(+15%). Quite brighter are `FUSPCEU`(+50%), `MPI`(+54%), and `RISE`(+55%). Darker are `TUM` (-25%) and `KNUST`(-46%)
- For the `Darkest 10h mean`, slight increases over `BAUA` (100%) are `THUAS`(+3%). Quite brighter are `MPI`(+61%), `TUM`(+64%), `UCR`(+71%), and `IZTECH`(+78%). Darker are `FUSPCEU`(-10%), `RISE`(-12%), and `KNUST`()
###### Timing-based
- Values are rounded to full minutes
- For the `Brightest 10h midpoint`, we see shifts to later timing relative to `BAUA` from `IZTECH`(+20min), `FUSPCEU`(+26min), and `TUM`(+31min). Sites with earlier timing are `THUAS`(-2min), `MPI`(-2min), `RISE`(-45min), `KNUST`(-57min), and `UCR`(-1h 19min)
- For the `Darkest 10h midpoint`, we see shifts to later timing relative to `BAUA` from `MPI`(+4min), `IZTECH`(+25min), and `TUM`(+51min). Sites with earlier timing are `FUSPCEU`(-2min), `THUAS`(-19min), `RISE`(-33min), `UCR`(-45min), and `KNUST`(-1h 7min)
- For the `Last timing above 250lx melanopic EDI`, we see shifts to later timing relative to `BAUA` from `RISE`(+2min), `THUAS`(+7min), `TUM`(+26min), `IZTECH`(+44min), and `FUSPCEU`(+1h 5min). Sites with earlier timing are `MPI`(-17min), `KNUST`(-1h 56min), and `UCR`(-2h 12min)
- For the `Mean timing above 250lx melanopic EDI`, we see shifts to later timing relative to `BAUA` from `MPI`(+0min), `THUAS`(+17min), `TUM`(+18min), `IZTECH`(+41min), and `FUSPCEU`(+41min). Sites with earlier timing are `RISE`(-28min), `KNUST`(-1h 19min), and `UCR`(-2h 4min).
##### Site effects are not random
- With the exception of the `Mean` and the `Darkest 10h mean` (both based on logarithmic scaling), site coefficients are not randomly (normally) distributed, but rather show specific differences, even when controlled for photoperiod
- for `Mean`, a random site factor translates to ± 32% for 1 standard deviation, and for the `Darkest 10h mean` ± 35% for 1 standard deviation
##### Latitude explains daytime levels better than site specifics coefficients
- For the `Mean` and the `Brightest 10h mean`(both based on logarithmic scaling), `latitude` is a better predictor (based on ∆AIC) compared to the `site` (both controlled for photoperiod).
- Per 10 degrees of latitude, the `Mean` increases by 15%, and the `Brightest 10h mean` by 24%
- These coefficients are multiplicative, meaning that, e.g., a 30° latitude increase for `Mean` would not result in and $3*15\%=45\%$ increase, but rather an increase of $1.15^3=1.52=+52%$
##### Photoperiod is highly relevant for duration, exposure history, level, and spectrum based metrics, but not for dynamics and timing based metrics
- Photoperiod is significant for `Dose` (exposure history), `MDER` (spectrum), all level-based metrics, all duration-based metrics sans `Duration below 1 lx melanopic EDI during sleep`, and the `Last timing above 250 lx melanopic EDI` and `Mean timing above 250 lx melanopic EDI` (timing).
- Photoperiod is not relevant for the `Duration below 1 lx melanopic EDI during sleep` (duration), dynamics-based metrics, and 3 out of 5 timing-based metrics (`Brightest 10h midpoint`, `Darkest 10h midpoint`, `First timing above 250 lx melanopic EDI`)
- For duration-based metrics, every hour of photoperiod increases `Duration above 1000 lx melanopic EDI` (+17%), `Duration above 250 lx melanopic EDI` (+10%), and `Period above 250 lx melanopic EDI` (+10%). Every hour of photoperiod also decreases the `Duration below 10 lx melanopic EDI during pre-sleep` (-3%).
- `Dose` is increased by +19% per hour of photoperiod
- `MDER` is increased by 0.02 per hour of photoperiod
- For level-based metrics, every hour of photoperiod increases the `Mean` (+18%), the `Brightest 10h mean` (+25%), the `Darkest 10h mean` (+9%)
- For timing-based metrics, every hour of photoperiod shifts the `Last timing above 250 lx melanopic EDI` (+20 minutes) and the `Mean timing above 250 lx melanopic EDI` (+~8 minutes)
##### Interindividual and intraindividual differences outweigh site-specific differences
- For both level and timing-based metrics *interindividual* differences (`SD Participant`) and *intraindividual* differences (`SD Residual`) outweigh most site-specific differences (i.e., coefficients)
- However, the highest site-specific differences (hightest site coefficient minus lowest site coefficient) are in the range of intraindividual differences (1 standard deviation for `SD Residual`)
- In most cases, *intraindividual* differences (`SD Residual`) outweigh *interindividual* differences (`SD Participant`). A notable exception is the `Darkest 10h mean`, where both are similar (`SD Participant` ±63% vs. `SD Residual` ±60%)
### R2
```{r}
#| code-summary: Code cell - R^2 table
R2_tbl <-
R2_table |>
mutate(metric_type = str_to_title(metric_type)) |>
arrange(metric_type, name) |>
group_by(metric_type) |>
gt(rowname_col = "name") |>
cols_add(Residual = 1 - conditional_r2) |>
fmt_percent(decimals = 0) |>
sub_missing() |>
tab_header(md("**H1: Variance explained by the model parameters ($R^{2}$)**")) |>
fmt(name,
rows = name != "MDER",
fns = \(x) x |> str_to_sentence() |> str_replace_all("_", " ")) |>
cols_move(rand_r2, after = lat_r2) |>
cols_move(marginal_r2, after = conditional_r2) |>
cols_label(
conditional_r2 = "Model",
rand_r2 = "Participants",
marginal_r2 = "Fixed effects",
site_r2 = "Site",
phot_r2 = "Photoperiod",
lat_r2 = "Latitude"
) |>
tab_spanner("Unique contribution", columns = c(site_r2, phot_r2, lat_r2)) |>
cols_width(
name ~ px(230),
phot_r2 ~ px(100),
c(rand_r2, Residual) ~ px(90),
everything() ~ px(70)) |>
grand_summary_rows(
columns = where(is.numeric),
fns = list(
`Grand average` ~ mean(., na.rm = TRUE)
),
side = "top",
fmt = ~ fmt_percent(., decimals = 0)
) |>
tab_style(
cell_text(align = "center"),
list(cells_body(),
cells_grand_summary(),
cells_column_labels())
) |>
tab_style(
cell_text(weight = "bold"),
list(cells_column_labels(),
cells_row_groups(),
cells_grand_summary(),
cells_stub_grand_summary())
) |>
tab_footnote(
md("*Model*: conditional $R^2$, *Participants*: partial $R^2$ due to interindividual differences on a participant level, Fixed effects: marginal $R^2$ (all fixed effects combined, based on a model of $site + photoperiod$), *Site/Photoperiod/Latitude*: unique contribution of the given parameter relative to a model without it. *Residual*: proportion of residual variance (e.g., intraindividual differences)")
)
```
```{r}
R2_tbl
gtsave(R2_tbl, "tables/H1_R2_tbl.png")
```
- Overall model fit varies widely (≈25–62%), with highest explanatory power for level-based metrics (e.g., darkest/mean light levels) and lowest for timing metrics.
- Fixed effects explain a modest share (≈2–24%), indicating environmental factors (site, photoperiod, latitude) contribute but are not dominant drivers.
- Individual (participant-level) differences account for a large portion of variance (≈20–42%), often exceeding fixed-effect contributions.
- Site effects are generally stronger than photoperiod, particularly for level metrics, but both show overlapping (shared) explanatory power.
- Latitude contributes only marginally where included, suggesting limited additional explanatory value beyond photoperiod.
- Residual variance remains high across most metrics (≈38–75%), indicating substantial unexplained within-individual or day-to-day variability.
### Model summaries
The following tabs show the model summaries for either
- $\text{Metric} \sim \text{Site} + \text{Photoperiod} + (1|\text{Site}:\text{Participant})$
or
- $\text{Metric} \sim \text{Photoperiod} + (1|\text{Site}:\text{Participant})$
depending on whether site is significant
::: {.panel-tabset}
```{r, results='asis', fig.width = 8, fig.height = 12}
#| warning: false
#| message: false
pwalk(list(metrics$name, metrics$H1_1_model,
metrics$H1_0_model, metrics$H1_1_sig, metrics$H1_1_comp
),
function(name, model1, model0, sig, comp) {
# skip NULL plots
if (is.null(model1)) return()
if(sig) summary <- model1 else summary <- model0
# format tab title
tab_title <- name |>
str_to_sentence() |>
str_replace_all("_", " ")
# emit markdown for tab
cat("\n\n##### ", tab_title, "\n\n", sep = "")
cat("\n\n", "Model summary", "\n\n", sep = "")
tbl <- summary |>
tbl_regression(add_estimate_to_reference_rows = TRUE) |> as_gt()
cat(gt::as_raw_html(tbl))
cat("\n\n", "Model comparison", "\n\n", sep = "")
tbl <- comp |>
tbl_regression(tidy_fun = broom.helpers::tidy_parameters) |> as_gt()
cat(gt::as_raw_html(tbl))
cat("\n\n")
})
```
:::
## Hypothesis 2
### Details
Hypothesis 2 states:
> The variance of personal light exposure patterns (30 minute mean of melanopic EDI) by participants within sites is greater than the variance between sites.
*Type of model:* HGAM (Hierarchical GAM)
*Base formula:* $\text{melanopic EDI} \sim s(\text{Time}) + s(\text{Time}, \text{Site}) + s(\text{Time}, \text{Participant})$
*Additional notes:*
- Site effects are modelled with the `sz` basis, that allows for the most identifiable modelling of parametric differences
- Participant effects are modelled with the `fs` basis, which models the participant-effects as random smooths
- melanopic EDI is scaled with `log_zero_inflated()` before model fitting.
*Specific formulas:*
- $\text{melanopic EDI} \sim s(\text{Time}) + s(\text{Time}, \text{Site}) + s(\text{Time}, \text{Participant}) + s(\text{Time}, \text{Photoperiod})$ (to test photoperiod effects)
- $\text{melanopic EDI} \sim s(\text{Time}) + s(\text{Time}, \text{Participant})$ (comparison to the base formula)
*False-discovery-rate correction:*
1 (there is only one main comparison that tests the effect of site)
### Preparation
```{r}
#| filename: Load prepared data
load("data/metrics_separate_glasses.RData")
load("data/preprocessed_glasses_2.RData")
metric_participanthour <-
metric_glasses_participanthour |>
add_Time_col() |>
ungroup() |>
mutate(Time = as.numeric(Time)/3600 + 0.25,
across(c(site, Id, sleep, State.Brown, wear, photoperiod.state),
fct),
Date = factor(Date),
Id_date = interaction(Id, Date),
lzMEDI = log_zero_inflated(MEDI),
photoperiod = (dusk - dawn) |> as.numeric()
) |>
group_by(Id) |>
mutate(AR.start = ifelse(row_number() == 1, TRUE, FALSE))
```
```{r}
#| fig-height: 12
#| fig-width: 12
#| filename: Overview plot of the data (median with IQR)
metric_participanthour |>
group_by(site) |>
aggregate_Date(
numeric.handler = \(x) median(x, na.rm = TRUE),
p75 = quantile(MEDI, p=0.75, na.rm = TRUE),
p25 = quantile(MEDI, p=0.25, na.rm = TRUE)
) |>
site_conv_mutate() |>
gg_doubleplot(geom = "blank", jco_color = FALSE, facetting = FALSE,
x.axis.breaks.repeat = ~Datetime_breaks(.x, by = "12 hours", shift =
lubridate::duration(0, "hours"))) +
scale_fill_manual(values = melidos_colors) +
scale_color_manual(values = melidos_colors) +
geom_ribbon(aes(ymin = p25, ymax = p75, color = site, fill = site)) +
geom_line() +
geom_hline(yintercept = 250, linetype = "dashed") +
facet_wrap(~site, scales = "free_x") +
guides(fill = "none", col = "none") +
theme(panel.spacing = unit(2, "lines"))
```
```{r}
#| filename: H2 formulas
H2_form_1 <- lzMEDI ~ s(Time, k = 12) + s(Time, site, bs = "sz", k = 12) + s(Time, Id, bs = "fs") + s(Id_date, bs = "re")
H2_form_0 <- lzMEDI ~ s(Time, k = 12) + s(Time, Id, bs = "fs") + s(Id_date, bs = "re")
H2_form_phot <- lzMEDI ~ s(Time, k = 12) + s(Time, site, bs = "sz", k = 12) + s(Time, photoperiod.state, bs = "sz", k=12) + s(Time, Id, bs = "fs") + s(Id_date, bs = "re")
```
### Analysis
#### Base model (H2_1)
```{r}
#| filename: Main model
H2_model_1 <-
bam(H2_form_1,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 10
)
r1 <- start_value_rho(H2_model_1,
plot=TRUE)
H2_model_1 <-
bam(H2_form_1,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 10,
rho = r1,
AR.start = AR.start
)
acf_resid(H2_model_1, split_pred = "AR.start")
H2_summary_1 <- summary(H2_model_1)
H2_summary_1
H2_model_1 |> glance()
H2_model_1 |> appraise()
```
```{r}
#| filename: Variance explained
Xp <- predict(H2_model_1, type = "terms")
variance <- apply(Xp, 2, var)
cor(Xp)
H2_R2 <- (variance/sum(variance)) |> enframe(value = "R2", name = "parameter")
H2_R2
```
#### Null model (H2_0)
```{r}
#| filename: Null model
H2_model_0 <-
bam(H2_form_0,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 8
)
r1 <- start_value_rho(H2_model_0,
plot=TRUE)
H2_model_0 <-
bam(H2_form_0,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 10,
rho = r1,
AR.start = AR.start
)
acf_resid(H2_model_0, split_pred = "AR.start")
H2_summary_0 <- summary(H2_model_0)
H2_summary_0
H2_model_0 |> glance()
H2_model_0 |> appraise()
H2_model_0 |> draw()
```
#### Photoperiod model
```{r}
#| filename: Photoperiod model
H2_model_phot <-
bam(H2_form_phot,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 10
)
r1 <- start_value_rho(H2_model_phot,
plot=TRUE)
H2_model_phot <-
bam(H2_form_phot,
data = metric_participanthour,
discrete = TRUE,
method = "fREML",
nthreads = 10,
rho = r1,
AR.start = AR.start
)
acf_resid(H2_model_phot, split_pred = "AR.start")
H2_summary_phot <- summary(H2_model_phot)
H2_summary_phot
H2_model_phot |> glance()
H2_model_phot |> appraise()
H2_model_phot |> draw()
```
#### Model comparison
```{r}
H2_AICs <- AIC(H2_model_1, H2_model_0, H2_model_phot)
H2_AICs |> as_tibble(rownames = "model") |> gt()
```
>> Since the ∆AIC of the base model against the null model is negative and lower than two (< -2), the base model is considered significantly better to explain the variance. The photoperiod model is even better and will be used henceforth.
```{r}
#| filename: Variance explained
Xp <- predict(H2_model_phot, type = "terms")
variance <- apply(Xp, 2, var)
cor(Xp)
H2_R2 <- (variance/sum(variance)) |> enframe(value = "partial R2", name = "parameter")
H2_R2 |> gt() |> fmt_percent()
```
#### Model visualization
```{r}
#| filename: prepare photoperiod model H2 plot
H2_site_data <-
H2_model_phot |>
conditional_values(
condition = list(Time = seq(0.25, 23.75, by = 0.5), "site", "photoperiod.state"),
exclude = c("s(Time,Id)", "s(Id_date)")
) |>
mutate(across(c(.fitted, .lower_ci, .upper_ci),
exp_zero_inflated)
)
H2_site_data <-
metric_participanthour |>
Datetime2Time() |>
group_by(site, Time) |>
summarize(
.groups = "drop_last",
dawn = dawn |> as.numeric() |> mean(na.rm = TRUE)/3600,
dusk = dusk |> as.numeric() |> mean(na.rm = TRUE)/3600
) |>
mutate(
photoperiod.state = ifelse(between(Time, dawn, dusk), "day", "night")
) |>
select(-c(dawn, dusk)) |>
left_join(H2_site_data, by = c("site", "Time", "photoperiod.state"))
H2_site_data_phot <-
H2_site_data |>
number_states(photoperiod.state) |>
filter(photoperiod.state == "night") |>
group_by(site, photoperiod.state.count) |>
summarize(xmin = min(Time),
xmax = max(Time),
.groups = "drop_last") |>
site_conv_mutate(rev = FALSE)
```
```{r}
#| filename: Visualize photoperiod model H2
#| fig-height: 9
#| fig-width: 10
source("scripts/Brown_bracket.R")
H2_pred <-
H2_site_data |>
site_conv_mutate(rev = FALSE) |>
ggplot(aes()) +
map(c(1,10,250, 10^4),
\(x) geom_hline(aes(yintercept = x),
col = "grey", linetype = "dashed")
) +
geom_rect(data = H2_site_data_phot ,
aes(xmin = xmin, ymin = -Inf, ymax = Inf, xmax = xmax),
alpha = 0.25, inherit.aes = FALSE) +
geom_ribbon(aes(x = Time, ymin = .lower_ci, ymax = .upper_ci, fill = site), alpha = 0.5) +
geom_line(aes(x = Time, y = .fitted, col = site), linewidth = 0.1) +
geom_line(aes(x = Time, y = .fitted), linewidth = 1) +
scale_fill_manual(values = melidos_colors) +
scale_color_manual(values = melidos_colors) +
theme_cowplot() +
scale_y_continuous(trans = "symlog", breaks = c(0,1,10,100,250, 1000)) +
scale_x_continuous(breaks = seq(0, 24, by = 6)) +
coord_cartesian(xlim = c(0,24), ylim = c(0,10^4), expand = FALSE) +
labs(y = "Melanopic EDI (lx)", x = "Local time (hrs)", fill = "Site", col = "Site") +
facet_wrap(~site) +
# guides( y = guide_axis_stack(Brown_bracket, "axis")) +
gghighlight(
label_key = site,
label_params = list(x = 12, y = 4.5, force = 0, hjust = 0.25)
) +
theme_sub_strip(background = element_blank(),
text = element_blank()) +
theme(panel.spacing = unit(1, "lines"),
plot.caption = element_textbox_simple())
# labs(caption = Brown_caption)
```
```{r}
#| filename: Visualize Individual
#| fig-height: 5
#| fig-width: 8.5
P_individual <-
H2_model_phot |>
conditional_values(
condition = list("Time", "Id"),
exclude = c("s(Time,site)", "s(Id_date)", "s(Time,photoperiod.state)")
) |>
mutate(across(c(.fitted, .lower_ci, .upper_ci),
exp_zero_inflated)
) |>
group_by(Id) |>
select(-site) |>
separate_wider_delim(Id, delim = "_", names = c("site","Id")) |>
site_conv_mutate() |>
ggplot(aes(x = Time)) +
map(c(1,10,250, 10^4),
\(x) geom_hline(aes(yintercept = x), col = "grey", linetype = "dashed")
) +
# geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci, group = Id), alpha = 0.025) +
geom_line(aes(y = .fitted, group = interaction(Id, site), color = site),
linewidth = 1, alpha = 0.35) +
theme_cowplot() +
scale_color_manual(values = melidos_colors) +
scale_y_continuous(trans = "symlog", breaks = c(0,1,10,100,250, 1000)) +
scale_x_continuous(breaks = seq(0, 24, by = 6)) +
coord_cartesian(xlim = c(0,24), ylim = c(0,1.5*10^4), expand = FALSE) +
labs(y = "Melanopic EDI (lx)", x = "Local time (hrs)") +
guides( y = guide_axis_stack(Brown_bracket, "axis"), color = "none") +
theme(plot.caption = element_textbox_simple())
# labs(caption = Brown_caption)
```
```{r}
#| filename: prepare time model
H2_data <-
H2_model_phot |>
conditional_values(
condition = list(Time = seq(0.25, 23.75, by = 0.5), "photoperiod.state"),
exclude = c("s(Time,site)", "s(Time,Id)", "s(Id_date)")
) |>
mutate(across(c(.fitted, .lower_ci, .upper_ci),
exp_zero_inflated)
)
H2_data <-
metric_participanthour |>
Datetime2Time() |>
group_by(Time) |>
summarize(
.groups = "drop_last",
dawn = dawn |> as.numeric() |> mean(na.rm = TRUE)/3600,
dusk = dusk |> as.numeric() |> mean(na.rm = TRUE)/3600
) |>
mutate(
photoperiod.state = ifelse(between(Time, dawn, dusk), "day", "night")
) |>
select(-c(dawn, dusk)) |>
left_join(H2_data, by = c("Time", "photoperiod.state"))
H2_data_phot <-
H2_data |>
number_states(photoperiod.state) |>
filter(photoperiod.state == "night") |>
group_by(photoperiod.state.count) |>
summarize(xmin = min(Time),
xmax = max(Time),
.groups = "drop_last")
```
```{r}
#| filename: Visualize Time
#| fig-height: 5
#| fig-width: 8.5
P_time <-
H2_data |>
ggplot(aes(x = Time)) +
map(c(1,10, 100, 250, 10^3),
\(x) geom_hline(aes(yintercept = x), col = "grey", linetype = "dashed")
) +
geom_rect(data = H2_data_phot ,
aes(xmin = xmin, ymin = -Inf, ymax = Inf, xmax = xmax),
alpha = 0.25, inherit.aes = FALSE) +
geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci), alpha = 0.5) +
geom_line(aes(y = .fitted),
linewidth = 1, alpha = 1) +
theme_cowplot() +
scale_y_continuous(trans = "symlog", breaks = c(0,1,10,100,250, 1000)) +
scale_x_continuous(breaks = seq(0, 24, by = 6)) +
coord_cartesian(xlim = c(0,24), ylim = c(0,1.5*10^3), expand = FALSE) +
labs(y = "Melanopic EDI (lx)", x = "Local time (hrs)") +
guides( y = guide_axis_stack(Brown_bracket, "axis"), color = "none") +
theme(plot.caption = element_textbox_simple())
# labs(caption = Brown_caption)
```
```{r}
#| filename: Visualize model terms
P_terms <-
H2_model_phot |>
smooth_estimates(select = "s(Time,site)") |>
mutate(.lower_ci = .estimate -1.96*.se,
.upper_ci = .estimate +1.96*.se,
.fitted = .estimate,
signif = .lower_ci > 0 | .upper_ci < 0,
signif_id = consecutive_id(signif),
.by = site) |>
site_conv_mutate(rev = FALSE) |>
ggplot(aes()) +
map(c(log10(c(0.1, 0.2, 0.5, 2, 5, 10))),
\(x) geom_hline(yintercept = x, linewidth = 0.5, col = "grey", linetype = 2)) +
geom_rect(data = H2_site_data_phot ,
aes(xmin = xmin, ymin = -Inf, ymax = Inf, xmax = xmax),
alpha = 0.25, inherit.aes = FALSE) +
geom_ribbon(aes(x = Time, ymin = .lower_ci, ymax = .upper_ci, fill = site), alpha = 0.5) +
geom_hline(yintercept = 0, linewidth = 1, linetype = 2) +
geom_line(aes(x = Time, y = .fitted),
linewidth = 1) +
geom_line(data = \(x) filter(x, signif),
aes(x = Time, y = .fitted, group = signif_id),
col = "red", linewidth = 1) +
scale_fill_manual(values = melidos_colors) +
# scale_color_manual(values = c(melidos_colors)) +
theme_cowplot() +
scale_y_continuous(labels = expression(10^-1, 5^-1, 2^-1, "1 ", "2 ", "5 ", "10 "),
breaks = log10(c(0.1, 0.2, 0.5, 1, 2, 5, 10)))+
scale_x_continuous(breaks = seq(0, 24, by = 6)) +
coord_cartesian(xlim = c(0,24), expand = FALSE) +
guides(fill = "none") +
labs(y = "Factor", x = "Local time (hrs)", fill = "Site") +
facet_wrap(~site) +
theme_sub_strip(background = element_blank(),
text = element_blank()) +
theme(panel.spacing = unit(1, "lines"),
plot.caption = element_textbox_simple())
```
```{r}
#| fig-height: 17
#| fig-width: 13
#| filename: Combination of partial plots H2
((P_time / P_individual) | (H2_pred / P_terms)) +
plot_annotation(tag_levels = "A", caption = Brown_caption,
theme = theme(plot.caption = element_textbox_simple())) +
plot_layout()
ggsave("figures/Fig4.pdf", height = 17, width = 13)
```
### Interpretation
There is an overall trend of time and photoperiod (panel A). Daytime melanopic EDI light levels are < 100 lx in the morning and late afternoon, and between 100 and about 200 lx between late morning and early afternoon. During the evening, light levels are below 10 lx melanopic EDI, and generally below 1 lx at night.
Individual deviations from this trend vary greatly, especially during the day (panel B)
Model predictions (panel C) and deviations from the global trend (panel D) of each site shows specific differences:
- Delft (NL), Tübingen (DE), and San José (CR) do not show significant deviations from the global trend. In the case of San José (CR) this is likely due to the small number of participants, as can be seen by the wide confidence interval compared to the other sites
- Boras (SE), Dortmund (DE), and partially Munich (DE) show similar patterns of higher mel EDI in the hours after dawn and before dusk. The difference is about a factor 2-5 higher compared to the global trend.
- Madrid (ES), Kumasi (GH), and Izmir (TR) have a significantly reduced mel EDI around dawn (factor 2-5 reduction)
- Madrid (ES) and Kumasi (GH) have significantly reduced mel EDI around dusk (factor 2-5 reduction)
- Izmir (TR) show significantly hightened melanopic EDI in the late evening (factor 2-5 increase)
- Kumasi (GH) has significantly lower daytime illuminance both in the afternoon and after dusk (factor 2-5 reduction)
## Session info
```{r}
#| code-summary: Code cell - Session info
sessionInfo()
```