Abstract
This tutorial presents an analysis pipeline for visual experience datasets, with a focus on reproducible workflows for human chronobiology and myopia research. Light exposure and its retinal encoding affect human physiology and behavior across multiple time scales. Here we provide step-by-step instructions for importing, visualizing, and processing viewing distance and light exposure data using the open-source R package LightLogR. This includes time-series analyses for working distance, biologically relevant light metrics, and spectral characteristics. By leveraging a modular approach, the tutorial supports researchers in building flexible and robust pipelines that accommodate diverse experimental paradigms and measurement systems.
1 Introduction
Exposure to the optical environment — often referred to as visual experience — profoundly influences human physiology and behavior across multiple time scales. Two notable examples, from distinct research domains, can be understood through a common retinally-referenced framework.
The first example relates to the non-visual effects of light on human circadian and neuroendocrine physiology. The light–dark cycle entrains the circadian clock, and light exposure at night suppresses melatonin production (Brown et al. 2022; Blume, Garbazza, and Spitschan 2019). The second example concerns the influence of visual experience on ocular development, particularly myopia. Time spent outdoors — characterized by distinct optical environments — has been consistently associated with protective effects on ocular growth and health outcomes (Dahlmann-Noor et al. 2025).
In controlled laboratory settings, light exposure can be held constant or manipulated parametrically. However, such exposures rarely replicate real-world conditions, which are inherently complex and dynamic. As people move in and between spaces (indoors and outdoors) and move their body, head, and eyes, exposure to the optical environment varies significantly (Webler et al. 2019) and is modulated by behavior (Biller, Balakrishnan, and Spitschan 2024). Wearable devices for measuring light exposure have thus emerged as vital tools to capture the richness of ecological visual experience. These tools generate high-dimensional datasets that demand rigorous and flexible analysis strategies.
Starting in the 1980s (Okudaira, Kripke, and Webster 1983), technology to measure optical exposure has matured, with miniaturized illuminance sensors now (in 2025) very common in consumer wearables. In research, several devices are available that differ in functionality, ranging from small pins measuring ambient illuminance (Mohamed et al. 2021) to head-mounted multi-modal devices capturing nearly all relevant aspects of visual experience (Gibaldi et al. 2024). Increased capabilities in wearables bring complex, dense datasets. These go hand-in-hand with a proliferation of metrics, as highlighted by recent review papers in both circadian and myopia research.
At present, the analysis processes to derive metrics are often implemented on a per-laboratory or even per-researcher basis. This fragmentation is a potential source of errors and inconsistencies between studies, and it consumes considerable researcher time (Hartmeyer, Webler, and Andersen 2022). Too often, more time is spent preparing data than gaining insights through rigorous statistical analysis. These preparation tasks are best handled, or at least facilitated, by standardized, transparent, community-based analysis pipelines (Zauner, Udovicic, and Spitschan 2024).
In circadian research, the R package LightLogR was developed to address this need (Zauner, Hartmeyer, and Spitschan 2025). LightLogR is an open-source, MIT-licensed, community-driven package specifically designed for data from wearable light loggers and optical radiation dosimeters. It contains functions to calculate over sixty different metrics used in the field (Hartmeyer and Andersen 2023). In a recent update, the package was expanded to handle modalities beyond illuminance, such as viewing distance and light spectra—capabilities highly relevant for myopia research (Hönekopp and Weigelt 2023).
In this article we show that the analysis pipelines and metric functions in LightLogR naturally apply to the whole field of visual experience, not just circadian research and chronobiology. Our approach is modular and extensible, allowing researchers to adapt it to a variety of devices and research questions. Emphasis is placed on clarity, transparency, and reproducibility, aligning with best practices in scientific computing and open science. We use data from two devices to demonstrate the LightLogR workflow and output with metrics relevant in the field of myopia, covering metrics for working distance, daylight exposure, and spectral analyses. It is recommended to recreate the analysis in this script. All necessary data and code are provided under an open license in the GitHub repository.
In this article, we demonstrate that LightLogR’s analysis pipelines and metric functions apply broadly across the field of visual experience research, not just to circadian rhythms and chronobiology. Our approach is modular and extensible, allowing researchers to adapt it to a variety of devices and research questions. Emphasis is placed on clarity, transparency, and reproducibility, aligning with best practices in scientific computing and open science. We use example data from two devices to showcase the LightLogR workflow with metrics relevant to myopia research, covering working distance, (day)light exposure, and spectral analysis. Readers are encouraged to recreate the analysis using the provided code. All necessary data and code are openly available in the GitHub repository.
2 Methods and materials
2.1 Software
This tutorial was built with Quarto
, an open-source scientific and technical publishing system that integrates text, code, and code output into a single document. The source code to reproduce all results is included and accessible via the Quarto document’s code tool
menu. All analyses were conducted in R (version 4.4.3, “Trophy Case”) using LightLogR
(version 0.9.2 “Sunrise”). We also used the tidyverse
suite (version 2.0.0) for data manipulation (which LightLogR follows in its design), and the gt
package (version 1.0.0) for generating summary tables. A comprehensive overview of the R computing environment is provided in the session info (see Session info section).
2.2 Metric selection and definitions
In March 2025, two workshops with myopia researchers — initiated by the Research Data Alliance (RDA) Working Group on Optical Radiation Exposure and Visual Experience Data — focused on current needs and future opportunities in data analysis, including the development and standardization of metrics. Based on expert input from these workshops, the authores of this tutorial compiled a list of visual experience metrics, shown in Table 1. These include many currently used metrics and definitions (Wen et al. 2020, 2019; Bhandari and Ostrin 2020; Williams et al. 2019), as well as new metrics enabled by spectrally-resolved measurements.
No. | Name | Implementation1 |
---|---|---|
Distance | ||
1 | Total wear time daily | durations() |
2 |
Duration of per each Distance range |
filter for distance range + durations() (for single ranges) or grouping by distance range + durations() (for all ranges) |
3 | Frequency of Continuous near work |
|
4 | Frequency, duration, and distances of Near Work episodes |
|
5 | Frequency and duration of Visual breaks |
filter |
Light | ||
6 | Light exposure (in lux) | summarize_numeric() |
7 | Duration per Outdoor range |
grouping by Outdoor range + |
8 | The number of times light level changes from indoor (<1000 lx) to outdoor (>1000 lx) | |
9 | Longest period above 1000 lx | period_above_threshold() |
Spectrum | ||
10 | Ratio of short vs. long wavelength light | |
11 | Short-wavelength light at certain times of day |
filter_Time() (for defined times) or grouping by time state + |
Table 2 provides definitions for the terms used in Table 1. Note that specific definitions may vary depending on the research question or device capabilities.
Metric | Description / pseudo formula |
---|---|
Total wear time | \(\sum(t)*dt, \textrm{ where } t\textrm{: valid observations }\) |
Mean daily | \(\frac{5*\bar{\textrm{weekday}} + 2*\bar{weekend}}{7}\) |
Near work | \(\textrm{working distance}, [10,60)cm\) |
Intermediate Work | \(\textrm{working distance}, [60,100)cm\) |
Total work | \(\textrm{working distance}, [10,120)cm\) |
Distance range | \(\textrm{working distance}, {[10,20)cm \textrm{, Extremely near} \\ [20,30)cm \textrm{, Very near} \\ [30,40)cm \textrm{, Fairly near} \\ [40,50)cm \textrm{, Near} \\ [50,60)cm \textrm{, Moderately near} \\ [60,70)cm \textrm{, Near intermediate} \\ [70,80)cm \textrm{, Intermediate} \\ [80,90)cm \textrm{, Moderately intermediate} \\ [90,100)cm \textrm{, Far intermediate}}\) |
Continuous near work |
\(\textrm{working distance}, [20,60)cm,\) \(T_\textrm{duration} ≥ 30 minutes, \textrm{ }T_{interruptions} ≤ 1 minute\) |
Near work episodes |
\(\textrm{working distance}, [20,60)cm,\) \(T_\textrm{interruptions} ≤ 20 seconds\) |
Ratio of daily near work | \(\frac{T_\textrm{near work}}{T_\textrm{total wear}}\) |
Visual break | \(\textrm{working distance} ≥ 100cm, \\ T_\textrm{duration} ≥ 20 seconds, \textrm{ }T_\textrm{previous episode} ≤ 20 minutes\) |
Outdoor range | \(\textrm{illuminance}, {[1000,2000)lx \textrm{, Outdoor bright} \\ [2000,3000)lx \textrm{, Outdoor very bright} \\ [3000, \infty) lx \textrm{, Outdoor extremely bright}}\) |
Light exposure2 | \(\bar{illuminance}\) |
Spectral bands | \(\textrm{spectral irradiance}, {[380,500]nm \textrm{, short wavelength light} \\ [600, 780]nm \textrm{, long wavelength light}}\) |
Ratio of short vs. long wavelength light | \(\frac{E_{e\textrm{,short wavelength}}}{E_{e\textrm{,long wavelength}}}\) |
2.3 Devices
Data from two wearable devices are used in this analysis:
Clouclip
: A wearable device that measures viewing distance and ambient light [Glasson Technology Co., Ltd, Hangzhou, China; Wen et al. (2021); Wen et al. (2020)]. The Clouclip provides a simple data output with only Distance (working distance, in centimeters) and Illuminance (ambient light, in lux). Data in our example were recorded at 5-second intervals. Approximately one week of data (~120,960 observations) is about 1.6 MB in size.Visual Environment Evaluation Tool
(VEET): A head-mounted multi-modal device that logs multiple data streams [Meta Platforms, Inc., Menlo Park, CA, USA; Sah, Narra, and Ostrin (2025)]. The VEET dataset used here contains simultaneous measurements of distance (via a time-of-flight sensor), ambient light (illuminance), activity (accelerometer & gyroscope), and spectral irradiance (multi-channel light sensor). Data were recorded at 2-second intervals, yielding a very dense dataset (~270 MB per week).
2.4 Data processing summary
The Results section uses imported and pre-processed data from the two devices to calculate metrics. Supplement 1 contains the annotated code and description for the steps involved. Following summary of the steps involved:
Data Import: We imported raw data from the Clouclip
and VEET
devices using LightLogR’s built-in import functions, which automatically handle device-specific formats and idiosyncrasies. The Clouclip
export file (provided as a tab-delimited text file) contains timestamped records of distance (cm) and illuminance (lux). LightLogR’s import$Clouclip
function reads this file, after specifying the device’s recording timezone, and converts device-specific sentinel codes into proper missing values. For instance, the Clouclip uses special numeric codes to indicate when it is in “sleep mode” or when a reading is out of the sensor’s range, rather than recording a normal value. LightLogR identifies -1
(for both distance and lux) as indicating the device’s sleep mode and 204
(for distance) as indicating the object was beyond the measurable range, replacing these with NA
and logging their status in separate columns. The import routine also provides an initial summary of the dataset, including start and end times and any irregular sampling intervals or gaps.
For the VEET
device, data were provided as CSV logs (zipped on Github, due to size). We focused on the ambient light sensor modality first. Using import$VEET(..., modality = "ALS")
, we extracted the illuminance (Lux
) data stream and its timestamps. The raw VEET data similarly can contain irregular intervals or missing periods (e.g., if the device stopped recording or was reset); the import summary flags these issues.
Irregular Intervals and Gaps: Both datasets showed irregular timing and missing data, i.e., gaps. Irregular data means that some observations did not align to the nominal sampling interval (e.g., slight timing drift or pauses in recording). For the Clouclip
5-second data, we detected irregular timestamps spanning all but the first and last day of the recording. Handling such irregularities is important because many downstream analyses assume a regular time series. We evaluated strategies to address this, including:
Removing an initial portion of data if irregularities occur mainly during device start-up.
Rounding all timestamps to the nearest regular interval (5 s in this case).
Aggregating to a coarser time interval (with some loss of temporal resolution).
Based on the import summary and visual inspection of the time gaps, we chose to round the observation times to the nearest 5-second mark, as this addressed the minor offsets without significant data loss. After rounding timestamps, we added an explicit date column for convenient grouping by day.
We then generated a summary of missing data for each day. Implicit gaps (intervals where the device should have recorded data but did not) were converted into explicit missing entries using LightLogR’s gap-handling functions. We also removed days that had very little data (in our Clouclip
example, days with <1 hour of recordings were dropped) to focus on days with substantial wear time.
After these preprocessing steps, the Clouclip
dataset had no irregular timestamps remaining and contained explicit markers for all periods of missing data (e.g., times when the device was off or not worn). The distance and illuminance values were now ready for metric calculations.
The VEET
illuminance data underwent a similar cleaning procedure. To make the VEET
’s 2-second illuminance data more comparable to the Clouclip
’s and to reduce computational load, we aggregated the illuminance time series to 5-second intervals. We then inserted explicit missing entries for any gaps and removed days with more than one hour of missing illuminance data. After cleaning, six days of VEET
illuminance data with good coverage remained for analysis (see Supplementary Material for details).
Finally, for spectral analysis, we imported the VEET
’s spectral sensor modality. This required additional processing: the raw spectral data consists of counts from 10 wavelength-specific channels (approximately 415 nm through 940 nm, plus two broadband clear channels and a dark channel) along with a sensor gain setting. We aggregated the spectral data to 5-minute intervals to focus on broader trends and reduce data volume. Each channel’s counts were normalized by the appropriate gain at each moment, and the two clear channels were averaged. Using a calibration matrix provided by the manufacturer (specific to the spectral sensor model), we reconstructed full spectral power distributions for each 5-minute interval. The end result is a list-column in the dataset where each entry is the estimated spectral irradiance across wavelengths for that time interval. (Detailed spectral preprocessing steps, including the calibration and normalization, are provided in the Supplement.) After spectral reconstruction, the dataset was ready for calculating example spectrum-based metrics.
This tutorial will start by importing a Clouclip
dataset and providing an overview of the data. The Clouclip
export is considerably simpler compared to the VEET
export, only containing Distance
and Illuminance
measurements. The VEET
dataset will be imported later for the spectrum related metrics.
3 Results
3.1 Distance
We first examine metrics related to viewing distance, using the processed Clouclip
dataset. Many distance-based metrics are computed for each day and then averaged over weekdays, weekends, or across all days. To facilitate this, we define a helper function that will take daily metric values and calculate the mean values for weekdays, weekends, and the overall daily average:
to_mean_daily <- function(data, prefix = "average_") {
data |>
ungroup(Date) |> # ungroup by days
mean_daily(prefix = prefix) |> # calculate the averages per grouping
rename_with(.fn = \(x) str_replace_all(x,"_"," ")) |> # remove underscores in names
gt() # format as a gt table for display
}
3.1.1 Total wear time daily
Total wear time daily refers to the amount of time the device was actively collecting distance data each day (i.e. the time the device was worn and operational). We compute this by summing all intervals where a valid distance measurement is present, ignoring periods where data are missing or the device was off. The results are shown in Table 3.
dataCC |>
durations(Dis) |> # calculate total duration of data per day
to_mean_daily("Total wear ")
Date | Total wear duration |
---|---|
Clouclip | |
Mean daily | 31448s (~8.74 hours) |
Weekday | 34460s (~9.57 hours) |
Weekend | 23918s (~6.64 hours) |
3.1.2 Duration within distance ranges
Many myopia-relevant metrics concern the time spent at certain viewing distances (e.g., “near work” vs. intermediate or far distances). We calculate the duration of time spent in specific distance ranges. Table 4 shows the average daily duration of near work, defined here as time viewing at 10–60 cm (a commonly used definition for near-work distance). Table 5 provides a more detailed breakdown across multiple distance bands.
dataCC |>
filter(Dis >= 10, Dis < 60) |> # consider only distances in [10, 60) cm
durations(Dis) |> # total duration in that range per day
to_mean_daily("Near work ")
Date | Near work duration |
---|---|
Clouclip | |
Mean daily | 22586s (~6.27 hours) |
Weekday | 26343s (~7.32 hours) |
Weekend | 13192s (~3.66 hours) |
First, we define a set of distance breakpoints and descriptive labels for each range:
# defining distance ranges (in cm)
dist_breaks <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, Inf)
dist_labels <- c(
"Extremely near", # [10, 20)
"Very near", # [20, 30)
"Fairly near", # [30, 40)
"Near", # [40, 50)
"Moderately near", # [50, 60)
"Near intermediate", # [60, 70)
"Intermediate", # [70, 80)
"Moderately intermediate", # [80, 90)
"Far intermediate", # [90, 100)
"Far" # [100, Inf)
)
Now we cut the distance data into these ranges and compute the daily duration spent in each range:
dataCC |>
mutate(Dis_range = cut(Dis, breaks = dist_breaks, labels = dist_labels)) |> # categorize distances
drop_na(Dis_range) |> # remove intervals with no data
group_by(Dis_range, .add = TRUE) |> # group by distance range (and by day)
durations(Dis) |> # duration per range per day
pivot_wider(names_from = Dis_range, values_from = duration) |> # wide format (ranges as columns)
to_mean_daily("") |>
fmt_duration(input_units = "seconds", output_units = "minutes") # convert seconds to minutes
Date | Extremely near | Very near | Fairly near | Near | Moderately near | Near intermediate | Intermediate | Moderately intermediate | Far intermediate | Far |
---|---|---|---|---|---|---|---|---|---|---|
Clouclip | ||||||||||
Mean daily | 169m | 102m | 46m | 27m | 13m | 7m | 4m | 5m | 11m | 16m |
Weekday | 180m | 128m | 60m | 36m | 16m | 7m | 6m | 6m | 14m | 20m |
Weekend | 141m | 38m | 12m | 5m | 5m | 8m | 1m | 3m | 2m | 5m |
To visualize this, Figure 1 illustrates the relative proportion of time spent in each distance range:
dataCC |>
mutate(Dis_range = cut(Dis, breaks = dist_breaks, labels = dist_labels)) |>
drop_na(Dis_range) |>
group_by(Dis_range, .add = TRUE) |>
durations(Dis) |>
group_by(Dis_range) |>
mean_daily(prefix = "") |>
ungroup() |>
mutate(Dis_range = forcats::fct_relabel(Dis_range, \(x) str_replace(x, " ", "\n"))) |>
mutate(duration = duration/sum(duration), .by = Date) |> # convert to percentage of daily total
ggplot(aes(x = Dis_range, y = duration, fill = Date)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::label_percent()) +
ggsci::scale_fill_jco() +
theme_minimal() +
labs(y = "Relative duration (%)", x = NULL, fill = "Day type") +
coord_flip()
3.1.3 Frequency of Continuous near work
Continuous near-work is typically defined as sustained viewing within a near distance for some minimum duration, allowing only brief interruptions. We use LightLogR’s cluster
function to identify episodes of continuous near work. Here we define a near-work episode as viewing distance between 20 cm and 60 cm that lasts at least 30 minutes, with interruptions of up to 1 minute allowed (meaning short breaks ≤1 min do not end the episode). Using extract_clusters() with those parameters, we count how many such episodes occur per day.
Table 6 summarizes the average frequency of continuous near-work episodes per day, and Figure 2 provides an example visualization of these episodes on the distance time series.
dataCC |>
extract_clusters(
Dis >= 20 & Dis < 60, # condition: near-work distance
cluster.duration = "30 mins", # minimum duration of a continuous episode
interruption.duration = "1 min", # maximum gap allowed within an episode
drop.empty.groups = FALSE # keep days with zero episodes in output
) |>
summarize_numeric(remove = c("start", "end", "epoch", "duration"),
add.total.duration = FALSE) |> # count number of episodes per day
mean_daily(prefix = "Frequency of ") |> # compute daily mean frequency
gt() |> fmt_number()
Date | Frequency of episodes |
---|---|
Clouclip | |
Mean daily | 0.86 |
Weekday | 1.20 |
Weekend | 0.00 |
dataCC |>
add_clusters(
Dis >= 20 & Dis < 60,
cluster.duration = "30 mins",
interruption.duration = "1 min"
) |>
gg_day(y.axis = Dis, y.axis.label = "Distance (cm)", geom = "line") |>
gg_state(state, fill = "red") +
geom_hline(yintercept = c(20, 60), col = "red", linetype = "dashed")
3.1.4 Near Work episodes
Beyond frequency, we can characterize near-work episodes by their duration and typical viewing distance. This section extracts all near-work episodes (using a shorter minimum duration to capture more routine near-work bouts) and summarizes three aspects: (1) frequency (count of episodes per day), (2) average duration of episodes, and (3) average distance during those episodes. These results are combined in Table 7.
dataCC |>
extract_clusters(
Dis >= 20 & Dis < 60,
cluster.duration = "5 secs", # minimal duration to count as an episode (very short to capture all)
interruption.duration = "20 secs",
drop.empty.groups = FALSE
) |>
extract_metric(dataCC, distance = mean(Dis, na.rm = TRUE)) |> # calculate mean distance during each episode
summarize_numeric(remove = c("start", "end", "epoch"), prefix = "",
add.total.duration = FALSE) |>
mean_daily(prefix = "") |> # daily averages for each metric
gt() |> fmt_number(c(distance, episodes), decimals = 0) |> #table
cols_units(distance = "cm")
Date | duration | distance, cm | episodes |
---|---|---|---|
Clouclip | |||
Mean daily | 233s (~3.88 minutes) | 32 | 57 |
Weekday | 284s (~4.73 minutes) | 32 | 64 |
Weekend | 104s (~1.73 minutes) | 32 | 40 |
In the above,
extract_metric(..., distance = mean(Dis, ...))
computes the mean viewing distance during each episode, and the subsequentsummarize_numeric
andmean_daily
steps derive daily averages of episode count, duration, and distance.
3.1.5 Visual breaks
Visual breaks
are a little different, compared to the previous metrics. The difference is that in this case, the minimum break and the previous episode is important. This leads to a two step process, where we first extract instances of Distance
above 100 cm for at least 20 seconds, before we filter for a previous duration of at maximum 20 minutes. Table 8 provides the daily frequency of visual breaks.
dataCC |>
extract_clusters(Dis >= 100, #define the condition, greater 100 cm away
cluster.duration = "20 secs", #define the minimum duration
return.only.clusters = FALSE, #return non-clusters as well
drop.empty.groups = FALSE #keep all days, even without clusters
) |>
# return only clusters with previous episode lengths of maximum 20 minutes:
filter((start - lag(end) <= duration("20 mins")), is.cluster) |>
summarize_numeric(remove = c("start", "end", "epoch", "is.cluster", "duration"),
prefix = "",
add.total.duration = FALSE) |> #count the number of episodes
mean_daily(prefix = "Daily ") |> #daily means
gt() |> fmt_number(decimals = 1) #table
Date | Daily episodes |
---|---|
Clouclip | |
Mean daily | 5.9 |
Weekday | 6.2 |
Weekend | 5.0 |
dataCC |>
extract_clusters(Dis >= 100, #define the condition, greater 100 cm away
cluster.duration = "20 secs", #define the minimum duration
return.only.clusters = FALSE, #return non-clusters as well
drop.empty.groups = FALSE #keep all days, even without clusters
) |>
# return only clusters with previous episode lengths of maximum 20 minutes:
filter((start - lag(end) <= duration("20 mins")), is.cluster) %>%
add_states(dataCC, ., ) |>
gg_day(y.axis = Dis, y.axis.label = "Distance (cm)", geom = "line") |>
gg_photoperiod(coordinates) +
geom_point(data = \(x) filter(x, is.cluster), col = "red")
3.2 Light
The Clouclip
illuminance data in our example are extremely low (the device was mostly used in dim conditions), which would make certain light exposure summaries trivial or not meaningful. To better illustrate light exposure metrics, we turn to the exemplary VEET
device’s illuminance data, which capture a broader range of lighting conditions. We import the VEET
ambient light data (already preprocessed to have regular 5-second intervals as described above) and briefly examine its distribution.
Illuminance Distribution: The illuminance values from the Clouclip were almost always near zero, while the VEET data include outdoor exposures up to several thousand lux. The contrast is evident by comparing histograms of the two datasets’ lux values (Clouclip vs. VEET). The VEET illuminance histogram (see Figure 5) shows a heavily skewed distribution with a spike at zero (indicating many intervals of complete darkness or the sensor covered) and a long tail extending to very high lux values. Such zero-inflated and skewed data are common in wearable light measurements (Zauner, Guidolin, and Spitschan).
dataCC |>
ggplot(aes(x = Lux)) +
geom_histogram(bins = 100) +
theme_minimal() +
scale_x_continuous(trans = "symlog", breaks = c(0, 1, 10, 100, 1000))
dataVEET |>
ggplot(aes(x = Lux)) +
geom_histogram(bins = 100) +
theme_minimal() +
scale_x_continuous(trans = "symlog", breaks = c(0, 1, 10, 100, 1000, 10000))
After confirming that the VEET data cover a broad dynamic range of lighting, we proceed with calculating light exposure metrics. (The VEET data had been cleaned for gaps and irregularities as described earlier; see Supplement 1 for the gap summary table.)
3.2.1 Average light exposure
A basic metric is the average illuminance over the day. Table 9 shows the mean illuminance (in lux) for weekdays, weekends, and overall daily mean, calculated directly from the raw lux values.
dataVEET |>
select(Id, Date, Datetime, Lux) |>
summarize_numeric(prefix = "mean ", remove = c("Datetime")) |>
to_mean_daily() |> # compute mean for weekday, weekend, all days
fmt_number(decimals = 1) |>
cols_hide(`average episodes`) # hide an irrelevant column (episodes count)
Date | average mean Lux |
---|---|
VEET | |
Mean daily | 304.1 |
Weekday | 357.8 |
Weekend | 169.8 |
However, because illuminance data tend to be extremely skewed and contain many zero values (periods of darkness), the arithmetic mean can be misleading. A common approach is to apply a logarithmic transform to illuminance before averaging, which down-weights extreme values and accounts for the multiplicative nature of light intensity effects. LightLogR provides helper functions log_zero_inflated()
and its inverse exp_zero_inflated()
to handle log-transformation when zeros are present (by adding a small offset before log, and back-transforming after averaging). Using this approach, we recompute the daily mean illuminance. The results in Table 10 show that the log-transformed mean (back-transformed to lux) is much lower, reflecting the fact that for much of the time illuminance was near zero. This transformed mean is often more representative of typical exposure for skewed data.
dataVEET |>
select(Id, Date, Datetime, Lux) |>
mutate(Lux = Lux |> log_zero_inflated()) |> # log-transform with zero handling
summarize_numeric(prefix = "mean ", remove = c("Datetime")) |>
mean_daily(prefix = "") |> # get daily mean of log-lux
mutate(`mean Lux` = `mean Lux` |> exp_zero_inflated()) |> # back-transform to lux
gt() |> fmt_number(decimals = 1) |> cols_hide(episodes)
Date | mean Lux |
---|---|
VEET | |
Mean daily | 6.3 |
Weekday | 7.9 |
Weekend | 3.5 |
3.2.2 Duration in high-light (outdoor) conditions
Another important metric is the amount of time spent under bright light, often used as a proxy for outdoor exposure. We define thresholds corresponding to outdoor light levels (e.g. 1000 lx and above). Here, we categorize each 5-second interval of illuminance into bands: Outdoor bright (≥1000 lx), Outdoor very bright (≥2000 lx), and Outdoor extremely bright (≥3000 lx). We then sum the duration in each category per day. We first create a categorical variable for illuminance range:
# Define outdoor illuminance thresholds (in lux)
out_breaks <- c(1e3, 2e3, 3e3, Inf)
out_labels <- c(
"Outdoor bright", # [1000, 2000) lx
"Outdoor very bright", # [2000, 3000) lx
"Outdoor extremely bright" # [3000, ∞) lx
)
dataVEET <- dataVEET |>
mutate(Lux_range = cut(Lux, breaks = out_breaks, labels = out_labels))
Now we compute the mean daily duration spent in each of these outdoor light ranges (Table 11):
dataVEET |>
drop_na(Lux_range) |>
group_by(Lux_range, .add = TRUE) |>
durations(Lux) |> # total duration per range per day
pivot_wider(names_from = Lux_range, values_from = duration) |>
to_mean_daily("") |>
fmt_duration(input_units = "seconds", output_units = "minutes")
Date | Outdoor bright | Outdoor very bright | Outdoor extremely bright |
---|---|---|---|
VEET | |||
Mean daily | 24m | 32m | 55m |
Weekday | 29m | 41m | 65m |
Weekend | 10m | 10m | 30m |
It is also informative to visualize when these high-light conditions occurred. Figure 6 shows a timeline plot with periods of outdoor-level illuminance highlighted in color. In this example, violet denotes ≥1000 lx, green ≥2000 lx, and yellow ≥3000 lx. Grey shading indicates nighttime (from civil dusk to dawn) for context.
dataVEET |>
gg_day(y.axis = Lux, y.axis.label = "Illuminance (lx)", geom = "line", jco_color = FALSE) |>
gg_state(Lux_range, aes_fill = Lux_range, alpha = 0.75) |>
gg_photoperiod(coordinates) +
scale_fill_viridis_d() +
labs(fill = "Illuminance range") +
theme(legend.position = "bottom")
3.2.3 Frequency of transitions from indoor to outdoor light
We next consider how often the subject moved from an indoor light environment to an outdoor-equivalent environment. We operationally define an “outdoor transition” as a change from <1000 lx to ≥1000 lx. Using the cleaned VEET data, we extract all instances where illuminance crosses that threshold from below to above.
Table 12 shows the average number of such transitions per day. Note that if data are recorded at a fine temporal resolution (5 s here), very brief excursions above 1000 lx could count as transitions and inflate this number. Indeed, the initial count is fairly high, reflecting fleeting spikes above 1000 lx that might not represent meaningful outdoor exposures.
dataVEET |>
extract_states(Outdoor, Lux >= 1000, group.by.state = FALSE) |> # label each interval as Outdoor (Lux≥1000) or not
filter(!lead(Outdoor) & Outdoor) |> # find instances where the previous interval was "indoor" and current is "outdoor"
summarize_numeric(prefix = "mean ",
remove = c("Datetime", "Outdoor", "start", "end", "duration"),
add.total.duration = FALSE) |>
mean_daily(prefix = "") |>
gt() |> fmt_number(episodes, decimals = 0) |>
fmt_duration(`mean epoch`, input_units = "seconds", output_units = "seconds")
Date | mean epoch | episodes |
---|---|---|
VEET | ||
Mean daily | 5s | 64 |
Weekday | 5s | 72 |
Weekend | 5s | 46 |
To obtain a more meaningful measure, we can require that the outdoor state persists for some minimum duration to count as a true transition (filtering out momentary fluctuations around the 1000 lx mark). For example, we can require that once ≥1000 lx is reached, it continues for at least 5 minutes (allowing short interruptions up to 20 s). Table 13 applies this criterion, resulting in a lower, more plausible transition count.
dataVEET |>
extract_clusters(Lux >= 1000,
cluster.duration = "5 min",
interruption.duration = "20 secs",
return.only.clusters = FALSE,
drop.empty.groups = FALSE) |>
filter(!lead(is.cluster) & is.cluster) |>
summarize_numeric(prefix = "mean ",
remove = c("Datetime", "start", "end", "duration"),
add.total.duration = FALSE) |>
mean_daily(prefix = "") |>
gt() |> fmt_number(episodes, decimals = 0)
Date | mean epoch | episodes |
---|---|---|
VEET | ||
Mean daily | 5s | 5 |
Weekday | 5s | 6 |
Weekend | 5s | 4 |
3.2.4 Longest sustained bright-light period
The final light exposure metric we illustrate is the longest continuous period above a certain illuminance threshold (often termed Longest Period Above Threshold, e.g. PAT1000 for 1000 lx). This gives a sense of the longest outdoor exposure in a day. Along with it, one might report the total duration above that threshold in the day (TAT1000). While we could derive these from the earlier analyses, LightLogR provides dedicated metric functions for such calculations, which can compute multiple related metrics at once.
Using the function period_above_threshold() for PAT and duration_above_threshold() for TAT, we calculate both metrics for the 1000 lx threshold. Table 14 shows the mean of these metrics across days (i.e., average longest bright period and average total bright time per day).
dataVEET |>
summarize(
period_above_threshold(Lux, Datetime, threshold = 1000, na.rm = TRUE, as.df = TRUE),
duration_above_threshold(Lux, Datetime, threshold = 1000, na.rm = TRUE, as.df = TRUE),
.groups = "keep"
) |>
to_mean_daily("")
Date | period above 1000 | duration above 1000 |
---|---|---|
VEET | ||
Mean daily | 1987s (~33.12 minutes) | 6709s (~1.86 hours) |
Weekday | 2501s (~41.68 minutes) | 8164s (~2.27 hours) |
Weekend | 702s (~11.7 minutes) | 3070s (~51.17 minutes) |
3.3 Spectrum
The VEET
device’s spectral sensor provides rich data beyond simple lux values, but it requires reconstruction of the actual light spectrum from raw sensor counts. We processed the spectral sensor data in order to compute two example spectrum-based metrics. Detailed data import, normalization, and spectral reconstruction steps are given in the Supplement 1; here we present the resulting metrics. Briefly, the VEET
’s spectral sensor recorded counts in ten wavelength bands (roughly 415 nm to 940 nm), plus two clear (broadband) channels. After normalizing by sensor gain and applying the calibration matrix, we obtained an estimated spectral irradiance distribution for each 5-minute interval in the recording. With these reconstructed spectra, we can derive novel metrics that consider spectral content of the light.
Spectrum-based metrics in wearable data are relatively new and less established compared to distance or broadband light metrics. The following examples illustrate potential uses of spectral data in a theoretical sense, which can be adapted as needed for specific research questions.
3.3.1 Ratio of short- vs. long-wavelength light
Our first spectral metric is the ratio of short-wavelength light to long-wavelength light, which is relevant, for example, in assessing the blue-light content of exposure. We define “short” wavelengths as 400–500 nm and “long” as 600–700 nm. Using the list-column of spectra in our dataset, we integrate each spectrum over these ranges (using spectral_integration()
), and then compute the ratio short/long for each time interval. We then summarize these ratios per day.
dataVEET <- dataVEET2 |>
select(Id, Date, Datetime, Spectrum) |> # focus on ID, date, time, and spectrum
mutate(
short = Spectrum |> map_dbl(spectral_integration, wavelength.range = c(400, 500)),
long = Spectrum |> map_dbl(spectral_integration, wavelength.range = c(600, 700)),
`sl ratio` = short / long # compute short-to-long ratio
)
Table 15 shows the average short/long wavelength ratio, averaged over each day (and then as weekday/weekend means if applicable). In this dataset, the values give an indication of the spectral balance of the light the individual was exposed to (higher values mean relatively more short-wavelength content).
dataVEET |>
summarize_numeric(prefix = "", remove = c("Datetime", "Spectrum")) |>
mean_daily(prefix = "") |>
gt() |> fmt_number(-`sl ratio`, decimals = 0) |> cols_hide(episodes)
Date | short | long | sl ratio |
---|---|---|---|
VEET | |||
Mean daily | 135 | 133 | 0.7307360 |
Weekday | 184 | 171 | 0.9613714 |
Weekend | 15 | 38 | 0.1541475 |
3.3.2 Short-wavelength light at specific times of day
The second spectral example examines short-wavelength light exposure as a function of time of day. Certain studies might be interested in, for instance, blue-light exposure during midday versus morning or night. We demonstrate three approaches: (a) filtering the data to a specific local time window, and (b) aggregating by hour of day to see a daily profile of short-wavelength exposure. Additionally, we (c) look at differences between day and night periods.
Table 16 isolates the time window between 11:00 and 14:00 each day and computes the average short-wavelength irradiance in that interval. This represents a straightforward query: “How much blue light does the subject get around midday on average?”
dataVEET |>
filter_Time(start = "11:00:00", end = "14:00:00") |> # filter data to local 11am–2pm
select(-c(Spectrum, long, `sl ratio`, Time, Datetime)) |>
summarize_numeric(prefix = "") |>
mean_daily(prefix = "") |>
gt() |> fmt_number(short) |>
cols_label(short = "Short-wavelength irradiance (a.u.)")
Date | Short-wavelength irradiance (a.u.) | episodes |
---|---|---|
VEET | ||
Mean daily | 133.68 | 37 |
Weekday | 161.12 | 37 |
Weekend | 65.08 | 37 |
To visualize short-wavelength exposure over the course of a day, we aggregate the data into hourly bins. We cut the timeline into 1-hour segments (using local time), compute the mean short-wavelength irradiance in each hour for each day, and then average across days. Figure 7 shows the resulting diurnal profile, with short-wavelength exposure expressed as a fraction of the daily maximum for easier comparison.
# Prepare hourly binned data
dataVEETtime <- dataVEET |>
cut_Datetime(unit = "1 hour", type = "floor", group_by = TRUE) |> # bin timestamps by hour
select(-c(Spectrum, long, `sl ratio`, Datetime)) |>
summarize_numeric(prefix = "") |>
group_by(Datetime.rounded, .drop = FALSE) |> # group by the hour-of-day time bin
mean_daily(prefix = "", sub.zero = TRUE) |> # average across days (sub.zero ensures missing hours are treated as zero)
add_Time_col(Datetime.rounded) # add a Time column (hour of day)
#creating the plot
dataVEETtime |>
ggplot(aes(x=Time, y = short/max(short))) +
geom_col(aes(fill = Date), position = "dodge") +
ggsci::scale_fill_jco() +
theme_minimal() +
labs(y = "Normalized short-wavelength irradianc",
x = "Local time (HH:MM)") +
scale_y_continuous(labels = scales::label_percent()) +
scale_x_time(labels = scales::label_time(format = "%H:%M"))
Finally, we compare short-wavelength exposure during daytime vs. nighttime. Using civil dawn and dusk information (based on geographic coordinates, here set for Houston, TX, USA), we label each measurement as day or night and then compute the total short-wavelength exposure in each period. Table 17 summarizes the daily short-wavelength dose received during the day vs. during the night.
dataVEET |>
select(-c(Spectrum, long, `sl ratio`)) |>
add_photoperiod(coordinates) |>
group_by(photoperiod.state, .add = TRUE) |>
summarize_numeric(prefix = "",
remove = c("dawn", "dusk", "photoperiod", "Datetime")) |>
group_by(photoperiod.state) |>
mean_daily(prefix = "") |>
select(-episodes) |>
pivot_wider(names_from =photoperiod.state, values_from = short) |>
gt() |> fmt_number()
Date | day | night |
---|---|---|
Mean daily | 238.55 | 2.91 |
Weekday | 323.85 | 4.20 |
Weekend | 25.30 | −0.32 |
In the above, add_photoperiod(coordinates)
is used as a convenient way to add columns to the data frame, indicating for each timestamp whether it was day or night, given the latitude/longitude.
4 Discussion and conclusion
This tutorial demonstrates a standardized, step-by-step pipeline to calculate a variety of visual experience metrics. We illustrated how a combination of LightLogR functions and tidyverse workflows can yield clear and reproducible analyses for wearable device data. While the full pipeline is detailed, each metric is computed through a dedicated sequence of well-documented steps. By leveraging LightLogR’s framework alongside common data analysis approaches, the process remains transparent and relatively easy to follow. The overall goal is to make analysis transparent (with open-source functions), accessible (through thorough documentation, tutorials, and human-readable function naming, all under an MIT license), robust (the package includes ~900 unit tests and continuous integration with bug tracking on GitHub), and community-driven (open feature requests and contributions via GitHub).
Even with standardized pipelines, researchers must still make and document many decisions during data cleaning, time-series handling, and metric calculations — especially for complex metrics that involve grouping data in multiple ways (for example, grouping by distance range as well as by duration for cluster metrics). We have highlighted these decision points in the tutorial (such as how to handle irregular intervals, choosing thresholds for “near” distances or “outdoor” light, and deciding on minimum durations for sustained events). Explicitly considering and reporting these choices is important for reproducibility and for comparing results across studies.
The broad set of features in LightLogR — ranging from data import and cleaning tools (for handling time gaps and irregularities) to visualization functions and metric calculators — make it a powerful toolkit for visual experience research. Our examples spanned circadian-light metrics and myopia-related metrics, demonstrating the versatility of a unified analysis approach. By using community-supported tools and workflows, researchers in vision science, chronobiology, myopia, and related fields can reduce time spent on low-level data wrangling and focus more on interpreting results and advancing scientific understanding.
5 Session info
R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] gt_1.0.0 lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[5] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5 tidyr_1.3.1
[9] tibble_3.3.0 ggplot2_3.5.2 tidyverse_2.0.0 LightLogR_0.9.2
loaded via a namespace (and not attached):
[1] sass_0.4.10 generics_0.1.4 renv_1.1.4 class_7.3-23
[5] xml2_1.3.8 KernSmooth_2.23-26 stringi_1.8.7 hms_1.1.3
[9] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.4 grid_4.5.0
[13] timechange_0.3.0 RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0
[17] e1071_1.7-16 DBI_1.2.3 viridisLite_0.4.2 scales_1.4.0
[21] cli_3.6.5 rlang_1.1.6 units_0.8-7 cowplot_1.1.3
[25] withr_3.0.2 yaml_2.3.10 tools_4.5.0 tzdb_0.5.0
[29] vctrs_0.6.5 R6_2.6.1 proxy_0.4-27 lifecycle_1.0.4
[33] classInt_0.4-11 htmlwidgets_1.6.4 pkgconfig_2.0.3 pillar_1.10.2
[37] gtable_0.3.6 Rcpp_1.0.14 glue_1.8.0 sf_1.0-21
[41] xfun_0.52 tidyselect_1.2.1 rstudioapi_0.17.1 knitr_1.50
[45] farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.29 ggsci_3.2.0
[49] labeling_0.4.3 suntools_1.0.1 compiler_4.5.0
6 References
Footnotes
Functions from
LightLogR
are presented as links to the function documentation. General analysis functions (from packagedplyr
) are presented as normal text.↩︎This deviates from the common definition of luminous exposure, which is the sum of illuminance measurements scaled to hourly observation intervals↩︎