Use case #04: Visual experience

Open and reproducible analysis of light exposure and visual experience data (Advanced)

Author
Affiliation

Johannes Zauner

Technical University of Munich & Max Planck Institute for Biological Cybernetics, Germany

Last modified:

December 9, 2025

1 Preface

Wearable devices can capture more besides illuminance or melanopic EDI. They can capture many aspects of the visual environment that, from an observer’s perspective, form the visual experience. In this use case we focus on a multimodal dataset, captured by the VEET device 1. It simultaneously captures from different sensors to collect light (ambient light sensor ALS), viewing distance (time of flight TOF), spectral information (spectral light channels, PHO), and motion and orientation (inertial measurements IMU). The data we are looking at do not come from a specific experiment, but are pilot data captured by an individual participant. The example is adapted from the online tutorial by Zauner et al. (2025).

Visual experience includes and goes beyond light exposure. It is about all aspects of the visual environment. In our case we also focus on viewing distance and spectral distribution

Visual experience includes and goes beyond light exposure. It is about all aspects of the visual environment. In our case we also focus on viewing distance and spectral distribution

The tutorial focuses on

  • import multimodal data

  • reconstruction of a spectral power distribution from sensor data

  • calculating spectrum-based metrics

  • simplifying a spatial grid of distance measurements

  • detecting clusters of conditions (for distance)

2 How this page works

This document contains the script for the online course series as a Quarto script, which can be executed on a local installation of R. Please ensure that all libraries are installed prior to running the script.

If you want to test LightLogR without installing R or the package, try the script version running webR, for a autonymous but slightly reduced version. These versions are indicated as (live), whereas the current version is considered (static), as it is pre-rendered.

To run this script, we recommend cloning or downloading the GitHub repository (link to Zip-file) and running the respective script. You will need to install the required packages. A quick way is to run:

renv::restore()
NoteIf this is your first course tutorial

This tutorial is considered as advanced. Basic functions in the LightLogR package as well as general tidy workflows are used without dedicated explanation. We recommend working through the beginner example if you are new to LightLogR (note that there is also a live variant).

3 Setup

We start by loading the necessary packages.

library(LightLogR) # main package
library(tidyverse) # for tidy data science
library(gt) # for great tables
library(ggridges) # for stacked plots within a panel
library(slider) # for sliding windows

4 Spectrum

4.1 Import

The VEET device records multiple data modalities in one combined file. Its raw data file contains interleaved records for different sensor types, distinguished by a “modality” field. Here we focus first on the spectral sensor modality (PHO) for spectral irradiance data, and below under Section 5.1 the time-of-flight modality (TOF) for distance data.

In the VEET’s export file, each line includes a timestamp and a modality code, followed by fields specific to that modality. Importantly, this means that the VEET export is not rectangular, i.e., tabular. The non-rectangular structure of the VEET files can be seen in Figure 1 to the side. This makes it challenging for many import functions that expect the equal number of columns in every row, which is not the case in this instance. For the PHO (spectral) modality, the columns include a timestamp, integration time, a general Gain factor, and nine sensor channel readings covering different wavelengths (with names like s415, s445, ..., s680, s940) as well as a Dark channel and a broadband channels Clear. In essence, the VEET’s spectral sensor captures light in several wavelength bands (from ~415 nm up to 940 nm, plus an infrared and “clear” channel) rather than outputting a single lux value like the ambient light sensor does (PHO). Unlike illuminance, the spectral data are not directly given as directly interpretable radiometric metrics but rather as raw sensor counts across multiple wavelength channels, which require conversion to reconstruct a spectral power distribution. In our analysis, spectral data allow us to compute metrics like the relative contribution of short-wavelength (blue) light versus long-wavelength light in the participant’s environment. Processing this spectral data involves several necessary steps.

Figure 1: Structure of the VEET file format

To import the VEET ambient light data, we use the LightLogR import function, specifying the PHO modality. The raw VEET data in our example is provided as a zip file (02_VEET_L.csv.zip) containing the logged data for about four days. We do the following:

# Import VEET Spectral Sensor (PHO) data
tz   <- "US/Central"    # assuming device clock was set to US Central
path <- "data/visual_experience/02_VEET_L.csv.zip"
data <- import$VEET(path, 
                    tz = tz, 
1                    modality = "PHO",
2                    manual.id = "VEET",
3                    version = "2.1.7"
                    )
1
modality is a parameter only the VEET device requires. If uncertain, which devices require special parameters, have a look a the import help page (?import) under the VEET device. Setting it to PHO gives us the spectral modality.
2
As we are only dealing with one individual here, we set a manual Id
3
In LightLogR 0.10.0 High noon, which is the current version as this use case was written, this argument would not be necessary, as it is the default version. Things might change, however in the future. Setting this version argument ensures that the file will be read in correctly, even if the default file format evolves over time.
Multiple files in zip: reading '02_VEET_L.csv'

Successfully read in 173'013 observations across 1 Ids from 1 VEET-file(s).
Timezone set is US/Central.
The system timezone is Europe/Berlin. Please correct if necessary!

First Observation: 2025-06-17 12:25:13
Last Observation: 2025-06-21 22:47:01
Timespan: 4.4 days

Observation intervals: 
   Id    interval.time      n pct      
 1 VEET  1s               417 0.24102% 
 2 VEET  2s            171837 99.32086%
 3 VEET  3s               738 0.42656% 
 4 VEET  4s                 7 0.00405% 
 5 VEET  9s                 1 0.00058% 
 6 VEET  11s                1 0.00058% 
 7 VEET  12s                2 0.00116% 
 8 VEET  13s                4 0.00231% 
 9 VEET  15s                1 0.00058% 
10 VEET  16s                1 0.00058% 
# ℹ 3 more rows
Figure 2: Overview plot of imported VEET spectral data
data |> gg_days(s480, y.axis.label = "s480 (raw counts)")

To check which version is available for a device (and what the default version is that is called when nothing is provided), check the following function.

supported_versions("VEET") |> gt()
Device Version Default Description
VEET initial FALSE Device format as it was initially implemented in LightLogR.
VEET 2.1.7 TRUE In firmware version 2.1.7 a change was introduced to the `PHO` modality, where only one `CLEAR` channel is exported instead of two.

After import, data contains columns for the timestamp (Datetime), Gain (the sensor gain setting), and the nine spectral sensor channels plus a clear channel. These appear as numeric columns named s415, s445, ..., s940, Dark, Clear. Other columns are also present but not needed for now. The spectral sensor was logging at a 2-second rate. It is informative to look at a snippet of the imported spectral data before further processing. Table 1 shows three rows of data after import (before calibration), with some technical columns omitted for brevity:

data |> 
  slice(6000:6003) |> 
  select(-c(modality, file.name, time_stamp)) |> 
  gt() |> 
  fmt_number(s415:Clear) 
Table 1: Overview of the spectral sensor import from the VEET device (3 observations). Each row corresponds to a 2-second timestamp (Datetime) and shows the raw sensor readings for the spectral channels (s415–s940, Dark, Clear). All values are in arbitrary sensor units (counts). Gain values and integration_time are also relevant for each interval, depending on the downstream computation.
Datetime integration_time Gain s415 s445 s480 s515 s555 s590 s630 s680 s910 Dark Clear
VEET
2025-06-18 00:11:22 100 512 20.00 27.00 31.00 42.00 47.00 63.00 90.00 139.00 756.00 0.00 534.00
2025-06-18 00:11:24 100 512 21.00 28.00 31.00 41.00 47.00 63.00 90.00 140.00 756.00 0.00 534.00
2025-06-18 00:11:26 100 512 21.00 27.00 31.00 41.00 47.00 63.00 90.00 139.00 756.00 0.00 534.00
2025-06-18 00:11:28 100 512 21.00 27.00 31.00 42.00 47.00 63.00 90.00 140.00 755.00 0.00 534.00

4.2 Spectral calibration

Now we proceed with spectral calibration. The VEET’s spectral sensor counts need to be converted to physical units (spectral irradiance) via a calibration matrix provided by the manufacturer. For this example, we assume we have a calibration matrix that maps all the channel readings to an estimated spectral power distribution (SPD). The LightLogR package provides a function spectral_reconstruction() to perform this conversion. However, before applying it, we must ensure the sensor counts are in a normalized form. This procedure is laid out by the manufacturer. In our version, we refer to the VEET SPD Reconstruction Guide.pdf, version 06/05/2025. Note that each manufacturer has to specify the method of count normalization (if any) and spectral reconstruction. In our raw data, each observation comes with a Gain setting that indicates how the sensor’s sensitivity was adjusted; we need to divide the raw counts by the gain to get normalized counts. LightLogR offers normalize_counts() for this purpose. We further need to scale by integration time (in milliseconds) and adjust depending on counts in the Dark sensor channel.

1count.columns <- c("s415", "s445", "s480", "s515", "s555", "s590", "s630",
                      "s680", "s910", "Dark", "Clear")

2gain.ratios <-
  tibble(
    gain = c(0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512),
    gain.ratio =
      c(0.008, 0.016, 0.032, 0.065, 0.125, 0.25, 0.5, 1, 2, 3.95, 7.75)
  )

#normalize data:
data <-
  data |> 
3  mutate(across(c(s415:Clear), \(x) (x - Dark)/integration_time)) |>
4  normalize_counts(
5    gain.columns = rep("Gain", 11),
6    count.columns = count.columns,
7    gain.ratios
  ) |> 
8  select(-c(s415:Clear)) |>
  rename_with(~ str_remove(.x, ".normalized"))
1
Column names of variables that need to be normalized
2
Gain ratios as specified by the manufacturer’s reconstruction guide
3
Remove dark counts & scale by integration time
4
Function to normalize counts
5
All sensor channels share the gain value
6
Sensor channels to normalize (see 1.)
7
Gain ratios (see 2.)
8
Drop original raw count columns

In this call, we specified gain.columns = rep("Gain", 11) because we have 11 sensor columns that all use the same gain factor column (Gain). This step will add new columns (with a suffix, e.g. .normalized) for each spectral channel representing the count normalized by the gain. We then dropped the raw count columns and renamed the normalized ones by dropping .normalized from the names. After this, data contains the normalized sensor readings for s415, s445, ..., s940, Dark, Clear for each 5-minute time point.

Because we do not need this high a resolution, we will aggregate it to a 5-minute interval for computational efficiency. The assumption is that spectral composition does not need to be examined at every 2-second instant for our purposes, and 5-minute averages can capture the general trends while drastically reducing data size and downstream computational costs.

# Aggregate spectral data to 5-minute intervals and mark gaps
data <- data |>
  aggregate_Datetime(unit = "5 mins") |>  # aggregate to 5-minute bins
  gap_handler(full.days = TRUE) |>        # explicit NA for any gaps in between
1  add_Date_col(group.by = TRUE) |>
2  remove_partial_data(Clear, threshold.missing = "-23 hour")
1
Group the data by calendar day
2
Remove days (i.e., groups) with less than 23 hours of available data. By setting the threshold.missing to a negative number, we enforce minimum duration, which is great when the total duration of a group is not known or varies.

We use one of the channels (here Clear) as the reference variable for remove_partial_data to drop incomplete days (the choice of channel is arbitrary as all channels share the same level of completeness). The dataset is now ready for spectral reconstruction.

Warning

Please be aware that normalize_counts() requires the Gain values according to the gain table. If we had aggregated the data before normalizing it, Gain values would have been averaged within each bin (5 minutes in this case). If the Gain did not change in that time, it is not an issue. Any mix of Gain values will lead to a Gain value that is not represented in the gain table. While outputs for normalize_counts() are not wrong in these cases, they will output NA if a Gain value is not found in the table. Thus we recommend to always normalize counts based on the raw dataset.

4.3 Spectral reconstruction

For spectral reconstruction, we require a calibration matrix that corresponds to the VEET’s sensor channels. This matrix would typically be obtained from the device manufacturer or a calibration procedure. It defines how each channel’s normalized count relates to intensity at various wavelengths. For demonstration, the calibration matrix was provided by the manufacturer and is specific to the make and model (see Figure 3). It should not be used for research purposes without confirming its accuracy with the manufacturer.

Figure 3: Calibration matrix
#import calibration matrix
1calib_mtx <-
  read_csv("data/visual_experience/VEET_calibration_matrix.csv",
           show_col_types = FALSE) |>
  column_to_rownames("wavelength")

# Reconstruct spectral power distribution (SPD) for each observation
data <- 
  data |> 
  mutate(
  Spectrum = 
2    spectral_reconstruction(
3      pick(s415:s910),
4      calibration_matrix = calib_mtx,
5      format = "long"
  )
)
1
Import the calibration matrix and make certain wavelength is set a rownames
2
The function spectral_reconstruction() does not work on the level of the dataset, but has to be called within mutate()(or provided the data directly)
3
Pick the normalized sensor columns
4
Provide the calibration matrix
5
Return a long-form list column (wavelength, intensity)

Here, we use format = "long" so that the result for each observation is a list-column Spectrum, where each entry is a tibble2 containing two columns: wavelength and irradiance (one row per wavelength in the calibration matrix). In other words, each row of data now holds a full reconstructed spectrum in the Spectrum column. The long format is convenient for further calculations and plotting. (An alternative format = "wide" would add each wavelength as a separate column, but that is less practical when there are many wavelengths.)

To visualize the data we will calculate the photopic illuminance based on the spectra and plot each spectrum color-scaled by their illuminance. For clarity, we reduce the data to observations within the day that has the most observations (non-NA).

data_spectra <- 
data |> 
1  sample_groups(order.by = sum(!is.implicit)) |>
  mutate( 
2    Illuminance = Spectrum |>
3      map_dbl(spectral_integration,
4              action.spectrum = "photopic",
5              general.weight = "auto")
  ) |> 
6  unnest(Spectrum)
data_spectra |> select(Id, Date, Datetime, Illuminance) |> distinct()
1
Keep only observations for one day (with the lowest missing intervals)
2
Use the spectrum,…
3
… call the function spectral_integration() for each,…
4
… use the brightness sensitivity function,…
5
… and apply the appropriate efficacy weight.
6
Create a long format of the data where the spectrum is unnested
# A tibble: 288 × 4
# Groups:   Id, Date [1]
   Id    Date       Datetime            Illuminance
   <fct> <date>     <dttm>                    <dbl>
 1 VEET  2025-06-18 2025-06-18 00:00:00        3.70
 2 VEET  2025-06-18 2025-06-18 00:05:00        3.79
 3 VEET  2025-06-18 2025-06-18 00:10:00        3.72
 4 VEET  2025-06-18 2025-06-18 00:15:00        6.90
 5 VEET  2025-06-18 2025-06-18 00:20:00        3.53
 6 VEET  2025-06-18 2025-06-18 00:25:00        3.26
 7 VEET  2025-06-18 2025-06-18 00:30:00        3.59
 8 VEET  2025-06-18 2025-06-18 00:35:00        3.49
 9 VEET  2025-06-18 2025-06-18 00:40:00        3.54
10 VEET  2025-06-18 2025-06-18 00:45:00        3.62
# ℹ 278 more rows

The following plot visualizes the spectra:

data_spectra |> 
  ggplot(aes(x = wavelength,group = Datetime)) +
  geom_line(aes(y = irradiance*1000, col = Illuminance)) +
  labs(y = "Irradiance (mW/m²/nm)", 
       x = "Wavelength (nm)", 
       col = "Photopic illuminance (lx)") +
  scale_color_viridis_b(breaks = c(0, 10^(0:3))) +
  scale_y_continuous(trans = "symlog", breaks = c(0, 1, 10, 50)) +
  coord_cartesian(ylim = c(0,NA), expand = FALSE) +
  theme_minimal()
Figure 4: Overview of the reconstructed spectra, color-scaled by photopic illuminance (lx)

The following ridgeline plot can be used to assess when in the day certain spectral wavelenghts are dominant:

data_spectra |> 
  ggplot(aes(x = wavelength, group = Datetime)) +
  geom_ridgeline(aes(height = irradiance*1000, 
                     y = Datetime, 
                     fill = Illuminance), 
                 scale = 400, lwd = 0.1, alpha = 0.7) +
  labs(y = "Local time & Irradiance (mW/m²/nm)", 
       x = "Wavelength (nm)", 
       fill = "Photopic illuminance (lx)")+
  scale_fill_viridis_b(breaks = c(0, 10^(0:3))) +
  theme_minimal()
Figure 5: Overview of the reconstructed spectra by time of day, color-scaled by photopic illuminance (lx)

At this stage, the data dataset has been processed to yield time-series of spectral power distributions. We can use these to compute biologically relevant light metrics. For instance, one possible metric is the proportion of power in short wavelengths versus long wavelengths.

With the VEET spectral preprocessing complete, we emphasize that these steps – normalizing by gain, applying calibration, and perhaps simplifying channels – are device-specific requirements. They ensure that the raw sensor counts are translated into meaningful physical measures (like spectral irradiance). Researchers using other spectral devices would follow a similar procedure, adjusting for their device’s particulars (some may output spectra directly, whereas others, like VEET, require reconstruction.

Note

Some devices may output normalized counts instead of raw counts. For example, the ActLumus device outputs normalized counts, while the VEET device records raw counts and the gain. Manufacturers will be able to specify exact outputs for a given model and software version.

4.4 Metrics

Note

Spectrum-based metrics in wearable data are relatively new and less established compared to distance or broadband light metrics. The following examples illustrate potential uses of spectral data in a theoretical sense, which can be adapted as needed for specific research questions.

4.4.1 Ratio of short- vs. long-wavelength light

Our first spectral metric is the ratio of short-wavelength light to long-wavelength light, which is relevant, for example, in assessing the blue-light content of exposure. For this example, we will arbitrarily define “short” wavelengths as 400–500 nm and “long” as 600–700 nm. Using the list-column of spectra in our dataset, we integrate each spectrum over these ranges (using spectral_integration()), and then compute the ratio short/long for each time interval. We then summarize these ratios per day.

data <- 
  data |> 
1  select(Id, Date, Datetime, Spectrum) |>
  mutate(
2    short =
      Spectrum |> map_dbl(spectral_integration, wavelength.range = c(400, 500)),
3    long  =
      Spectrum |> map_dbl(spectral_integration, wavelength.range = c(600, 700))
  )
1
Focus on ID, date, time, and spectrum
2
Integrate over the spectrum from 400 to 500 nm
3
Integrate over the spectrum from 600 to 700 nm

Table 2 shows the average short/long wavelength ratio, averaged over each day (and then as weekday/weekend means if applicable). In this dataset, the values give an indication of the spectral balance of the light the individual was exposed to (higher values mean relatively more short-wavelength content).

data |> 
  summarize_numeric(prefix = "", remove = c("Datetime", "Spectrum")) |> 
  mutate(`sl ratio` = short / long) |> 
  gt() |> 
  fmt_number(decimals = 1, scale_by = 1000) |>
  fmt_number(`sl ratio`, decimals = 3) |>
  cols_hide(episodes)
Table 2: Average (mW/m²) and ratio of short-wavelength (400–500 nm) to long-wavelength (600–700 nm) light
Date short long sl ratio
VEET
2025-06-18 83.7 71.7 1.168
2025-06-20 114.0 81.6 1.396

4.4.2 Melanopic daylight efficacy ratio (MDER)

The same idea is behind calculating the melanopic daylight efficacy ratio (or MDER), which is defined as the melanopic EDI divided by the photopic illuminance. Results are shown in Table 3. In this case, instead of a simple integration over a wavelength band, we apply an action spectrum to the spectral power distribution (SPD), integrate over the weighted SPD, and apply a correction factor. All of this is implemented in spectral_integration().

data <- 
  data |> 
  select(Id, Date, Datetime, Spectrum, short, long) |>
  mutate(
1    melEDI =
      Spectrum |> map_dbl(spectral_integration, action.spectrum = "melanopic"),
2    illuminance  =
      Spectrum |> map_dbl(spectral_integration, action.spectrum = "photopic")
  )
1
Calculate melanopic EDI by applying the \(s_{mel (\lambda)}\) action spectrum, integrating, and weighing
2
Calculate photopic illuminance by applying the \(V_{(\lambda)}\) action spectrum, integrating, and weighing
data |> 
  summarize_numeric(prefix = "", remove = c("Datetime", "Spectrum")) |> 
  mutate(MDER = melEDI / illuminance) |>
  gt() |> 
  fmt_number(decimals = 1, scale_by = 1000) |>
  fmt_number(MDER, decimals = 3) |>
  cols_hide(episodes)
Table 3: Average melanopic daylight efficacy ratio (MDER)
Date short long melEDI illuminance MDER
VEET
2025-06-18 83.7 71.7 80.6 103.7 0.777
2025-06-20 114.0 81.6 108.0 123.9 0.871

4.4.3 Short-wavelength light at specific times of day

The second spectral example examines short-wavelength light exposure as a function of time of day. Certain studies might be interested in, for instance, blue-light exposure during midday versus morning or night. We demonstrate three approaches: (a) filtering the data to a specific local time window, and (b) aggregating by hour of day to see a daily profile of short-wavelength exposure. Additionally, we (c) look at differences between day and night periods.

Table 4 isolates the time window between 7:00 and 11:00 each day and computes the average short-wavelength irradiance in that interval. This represents a straightforward query: “How much blue light does the subject get in the morning on average?”

data |> 
1  filter_Time(start = "7:00:00", end = "11:00:00") |>
  select(c(Id, Date, short)) |>
  summarize_numeric(prefix = "") |> 
  gt() |> 
  fmt_number(short, scale_by = 1000) |> 
  cols_label(short = "Short-wavelength irradiance (mW/m²)") |> 
  cols_hide(episodes)
1
Filter data to local 7am–11am
Table 4: Average short-wavelength light (400–500 nm) exposure between 7:00 and 11:00 each day
Date Short-wavelength irradiance (mW/m²)
VEET
2025-06-18 5.44
2025-06-20 0.95

To visualize short-wavelength exposure over the course of a day, we aggregate the data into hourly bins. We cut the timeline into 1-hour segments (using local time), compute the mean short-wavelength irradiance in each hour for each day. Figure 6 shows the resulting diurnal profile, with short-wavelength exposure expressed as a fraction of the daily maximum for easier comparison.

# Prepare hourly binned data
data_time <- data |>
1  cut_Datetime(unit = "1 hour", type = "floor", group_by = TRUE) |>
  select(c(Id, Date, short)) |>
  summarize_numeric(prefix = "") |> 
2  add_Time_col(Datetime.rounded)  |>
  mutate(rel_short = short / max(short))
1
Bin timestamps by hour
2
Add a Time column (hour of day, based on the cut/rounded data)
Adding missing grouping variables: `Datetime.rounded`
#creating the plot
data_time |> 
  ggplot(aes(x=Time, y = rel_short)) +
  geom_col(aes(fill = factor(Date)), position = "dodge") +
  ggsci::scale_fill_jco() +
  theme_minimal() +
  labs(y = "Normalized short-wavelength irradiance", 
       x = "Local time (HH:MM)",
       fill = "Date") + 
  scale_y_continuous(labels = scales::label_percent()) +
  scale_x_time(labels = scales::label_time(format = "%H:%M"))
Figure 6: Diurnal profile of short-wavelength light exposure. Each bar represents the average short-wavelength irradiance at that hour of the day (0–23 h), normalized to the daily maximum.

Finally, we compare short-wavelength exposure during daytime vs. nighttime. Using civil dawn and dusk information (based on geographic coordinates, here set for Houston, TX, USA), we label each measurement as day or night and then compute the total short-wavelength exposure in each period. Table 5 summarizes the daily short-wavelength dose received during the day vs. during the night.

1coordinates <- c(29.75, -95.36)
data |>
  select(c(Id, Date, Datetime, short)) |>
2  add_photoperiod(coordinates) |>
  group_by(photoperiod.state, .add = TRUE) |>
  summarize_numeric(prefix = "", 
                    remove = c("dawn", "dusk", "photoperiod", "Datetime")) |> 
  group_by(Id, photoperiod.state) |> 
  select(-episodes) |> 
  pivot_wider(names_from =photoperiod.state, values_from = short) |> 
  gt() |> 
  fmt_number(scale_by = 1000, decimals = 1) |> 
  tab_header("Average short-wavelength irradiance (mW/m²)")
1
Coordinates for Houston, Texas, USA
2
Adding and grouping by photoperiod
Table 5: Short wavelength light exposure (mW/m²) during the day and at night
Average short-wavelength irradiance (mW/m²)
Date day night
VEET
2025-06-18 126.5 12.3
2025-06-20 181.8 1.0

5 Distance

5.1 Import

In this second section, the distance data of the VEET device will be imported, analogous to the other modality. The TOF modality contains information for up to two objects in a 8x8 grid of measurements, spanning a total of about 52° vertically and 41° horizontally. Because the VEET device can detect up to two objects in a given grid point, and there is a confidence value assigned to every measurement, each observation contains \(2*2*8*8 = 256\) measurements.

# Import VEET Spectral Sensor (TOF) data
path <- "data/visual_experience/01_VEET_L.csv.zip"
dataDist <- import$VEET(path, 
                        tz = tz, 
1                        modality = "TOF",
2                        manual.id = "VEET",
3                        version = "initial")
1
modality is a parameter only the VEET device requires. If uncertain, which devices require special parameters, have a look a the import help page (?import) under the VEET device. Setting it to TOF gives us the distance modality.
2
As we are only dealing with one individual here, we set a manual Id
3
This file was captured with an older VEET firmware. Thus we set version = "initial", so it reads it correctly.
Multiple files in zip: reading '01_VEET_L.csv'

Successfully read in 304'195 observations across 1 Ids from 1 VEET-file(s).
Timezone set is US/Central.
The system timezone is Europe/Berlin. Please correct if necessary!

First Observation: 2024-06-04 15:00:36
Last Observation: 2024-06-12 08:29:43
Timespan: 7.7 days

Observation intervals: 
  Id    interval.time              n pct      
1 VEET  0s                         3 0.00099% 
2 VEET  1s                      2089 0.68673% 
3 VEET  2s                    299876 98.58051%
4 VEET  3s                      2213 0.72750% 
5 VEET  4s                         3 0.00099% 
6 VEET  6s                         1 0.00033% 
7 VEET  9s                         7 0.00230% 
8 VEET  109s (~1.82 minutes)       1 0.00033% 
9 VEET  59077s (~16.41 hours)      1 0.00033% 
Figure 7: Overview plot of imported VEET data

In a first step, we condition the data similarly to the other VEET modality. For computational reasons of the use cases, we remove the second object and set the interval to 10 seconds. Note that the next step still takes considerable computation time.

# Aggregate distance data to 10-second intervals and mark gaps
dataDist <- 
  dataDist |>
1  select(-contains(c("conf2_", "dist2_"))) |>
2  aggregate_Datetime(unit = "10 secs") |>
3  gap_handler(full.days = TRUE) |>
  add_Date_col(group.by = TRUE) |> 
  remove_partial_data(dist1_0, threshold.missing = "1 hour")
dataDist |> summary_overview(dist1_0)
1
Remove the second object (for computational reasons)
2
Aggregate to 10-second bins
3
Explicit NA for any gaps
# A tibble: 4 × 4
  name                mean   min   max
  <chr>              <dbl> <dbl> <dbl>
1 Participants           1    NA    NA
2 Participant-days       6     6     6
3 Days ≥80% complete     6     6     6
4 Missing/Irregular      0     0     0

In the next step, we need to transform the wide format of the imported dataset into a long format, where each row contains exactly one observation for one grid-point.

dataDist <- 
  dataDist |> 
  pivot_longer(
    cols = -c(Datetime, file.name, Id, is.implicit, time_stamp, modality, Date),
    names_to = c(".value", "position"),
    names_pattern = "(conf1|conf2|dist1|dist2)_(\\d+)"
  )

In a final step before we can use the data in the analysis, we need to assign x and y coordinates based on the position column that was created when pivoting longer. Positions are counted from 0 to 63 starting at the top right and increasing towards the left, before continuing on the right in the next row below. y positions thus depend on the row count, i.e., how often a row of 8 values fits into the position column. x positions consequently depend on the position within each 8-value row. We also add an observation variable that increases by +1 every time, the position column hits 0. We then center both x and y coordinates to receive meaningful values, i.e., 0° indicates the center of the overall measurement cone. Lastly, we convert both confidence columns, which are scaled from 0-255 into percentages by dividing them by 255. Empirical data from the manufacturer points to a threshold of about 10%, under which the respective distance data is not reliable.

dataDist <- 
  dataDist |> 
  mutate(position = as.numeric(position),
1         y.pos = (position %/% 8)+1,
2         y.pos = scale(y.pos, scale = FALSE)*52/8,
3         x.pos = 8 - (position %% 8),
4         x.pos = scale(x.pos, scale = FALSE)*41/8,
5         observation = cumsum(position == 0),
6         across(starts_with("conf"), \(x) x/255)
         )
1
Increment the y position for every 8 steps in position
2
Center y.pos and rescale it to cover 52° across 8 steps
3
Increment the x position for every step in position, resetting every 8 steps
4
Center x.pos and rescale it to cover 41° across 8 steps
5
Increase an observation counter every time we restart with position at 0
6
Scale the confidence columns so that 255 = 100%

Now this dataset is ready for further analysis. We finish by visualizing the same observation time on different days. Note that we replace zero distance values with Infinity, as these indicate measurements outside the 5m measurement radius of the device.

1extras <- list(
  geom_tile(),
  scale_fill_viridis_c(direction = -1, limits = c(0, 200),
                       oob = scales::oob_squish_any),
  scale_color_manual(values = c("black", "white")),
  theme_minimal(),
  guides(colour = "none"),
  geom_text(aes(label = (dist1/10) |> round(0), colour = dist1>1000),
            size = 2.5),
  coord_fixed(),
  labs(x = "X position (°)", y = "Y position (°)",
       fill = "Distance (cm)"))

2slicer <- function(x){seq(min((x-1)*64+1), max(x*64, by = 1))}

dataDist |> 
3  slice(slicer(4765)) |>
4  mutate(dist1 = ifelse(dist1 == 0, Inf, dist1)) |>
5  filter(conf1 >= 0.1 | dist1 == Inf) |>
6  ggplot(aes(x=x.pos, y=y.pos, fill = dist1/10))+ extras +
7  facet_grid(~Datetime)
1
Set visualization parameters
2
Allows to choose an observation
3
Choose a particular observation
4
Replace 0 distances with Infinity
5
Remove data that has less than 10% confidence
6
Plot the data
7
Show one plot per day
Figure 8: Example observations of the measurement grid at 1:14 p.m. for each measurement day. Text values show distance in cm. Empty grid points show values with low confidence. Zero-distance values were replaced with infinite distance and plotted despite low confidence.

As we can see from the figure, different days have - at a given time - a vastly different distribution of distance data, and measurement confidence (values with confidence < 10% are removed)

5.2 Distance with spatial distribution

Some devices output a singular measure for distance (e.g., the Clouclip). The visual environment in natural conditions contains many distances, depending on the solid angle and direction of the measurement. A device like the VEET increases the spatial resolution of these measurements, allowing for more in-depth analyses of the size and position of an object within the field of view. In the case of the VEET, data are collected from an 8x8 measurement grid, spanning 52° vertically and 41° horizontally.

There are many ways how a spatially resolved distance measure could be utilized for analysis:

  • Where in the field of view are objects in close range
  • How large are near objects in the field of view
  • How varied are distances within the field of view
  • How close are objects / is viewing distance in a region of interest within the field of view

In view of the scope of these use cases, we will focus on the last point, but all aspects could be tackled with the available data. As these are still time-series data, the change of these aspects over time is also a relevant aspect.

5.3 Distance in a region of interest

We will reduce the dataset to (high confidence) values at or around a given grid position, e.g., ±10 degrees around the central view (0°).

dataDist <- 
dataDist |> 
1  filter(conf1 >= 0.1,
         between(x.pos, -10,10),
         between(y.pos, -10,10)) |>
2  group_by(Datetime, .add = TRUE) |>
  summarize(
3    distance = mean(dist1),
4    n = n(),
    .groups = "drop_last"
  )

dataDist |> 
5  filter_Date(start = "2024-06-10", length = "1 day") |>
6  aggregate_Datetime("15 mins", numeric.handler = \(x) mean(x, na.rm = TRUE)) |>
7  remove_partial_data(by.date = TRUE) |>
8  gg_day(y.axis = distance/10,
         geom = "line",
         linewidth = 1,
         alpha = 0.75,
         y.scale = "identity",
         y.axis.breaks = seq(0,150, by = 20),
         y.axis.label = "Distance (cm)"
         )
1
Remove data with low confidence, and outside ±10°
2
Group by every observation. While this grouping is dropped in the next step it is crucial to derive at one value per time point (and Id).
3
Calculate central distance
4
Number of (valid) grid points
5
Filter one day
6
Create 15 minute data
7
Remove midnight data points
8
Setting up the plot for distance. It is a nice example showcasing the many sensible defaults gg_day() has for light variables, compared to other modalities
Figure 9

5.4 Distance-based metrics

The following distance-based metrics use LightLogR functions to derive at the results. They can be mapped broadly 1:1 to light metrics, provided that thresholds are adjusted for light.

5.4.1 Daily duration of near work

dataDist |> 
1  filter(distance >= 100, distance < 600) |>
2  durations(distance) |>
4  # mutate(Daytype = wday(Date, week_start = 1, label = TRUE))
3  mean_daily() |>
  gt()
1
Consider only distances in [10, 60) cm
2
Total duration in that range per day
3
Summary across weekday and weekend
4
Can be used to check the weekday types
Table 6: Daily duration of near work (10–60 cm viewing distance)
Date average_duration
VEET
Mean daily 38595s (~10.72 hours)
Weekday 37135s (~10.32 hours)
Weekend 42245s (~11.73 hours)

5.4.2 Frequency of continuous near work

Continuous near-work is typically defined as sustained viewing within a near distance for some minimum duration, allowing only brief interruptions. We use LightLogR’s cluster function to identify episodes of continuous near work. Here we define a near-work episode as viewing distance between 20 cm and 60 cm that lasts at least 30 minutes, with interruptions of up to 1 minute allowed (meaning short breaks ≤1 min do not end the episode). Using extract_clusters() with those parameters, we count how many such episodes occur per day.

Table 7 summarizes the average frequency of continuous near-work episodes per day, and Figure 10 provides an example visualization of these episodes on the distance time series.

dataDist |> 
  extract_clusters(
1    distance >= 200 & distance < 600,
2    cluster.duration = "30 mins",
3    interruption.duration = "1 min",
4    drop.empty.groups = FALSE
  ) |> 
5  summarize_numeric(remove = c("start", "end", "epoch", "duration"),
                    add.total.duration = FALSE) |>
6  mean_daily(prefix = "Frequency of ") |>
  gt() |> 
  fmt_number() 
1
Condition: near-work distance
2
Minimum duration of a continuous episode
3
Maximum gap allowed within an episode
4
Keep days with zero episodes in output
5
Count number of episodes per day
6
Compute daily mean frequency
Table 7: Frequency of continuous near-work episodes per day
Date Frequency of episodes
VEET
Mean daily 3.86
Weekday 4.00
Weekend 3.50
dataDist |> 
1  add_clusters(
    distance >= 200 & distance < 600,
    cluster.duration = "30 mins",
    interruption.duration = "1 min"
  ) |>
  gg_day(y.axis = distance/10, 
         y.axis.label = "Distance (cm)", 
         geom = "line",
         y.scale = "identity", 
         y.axis.breaks = seq(0,100, by = 20)
         ) |> 
2  gg_states(state, fill = "red") +
  geom_hline(yintercept = c(20, 60), col = "red", linetype = "dashed") +
  coord_cartesian(ylim = c(0,100))
1
This function does not extract the clusters from the dataset, but rather annotates the dataset.
2
Add state bands
Figure 10: Example of continuous near-work episodes. Red shaded areas indicate periods of continuous near work (20–60 cm for ≥30 min, allowing ≤1 min interruptions). Black trace is viewing distance over time; red dashed lines mark the 20 cm and 60 cm boundaries.

The last figure showcases a serious issue. During the nighttime, when the device is not worn, it records distance values in the area of interest. To get rid of these values, we remove times with low variance, based on a sliding window of 10 minutes. Let us first determine what such a value would look like.

dataDist <- 
dataDist |> 
  mutate(dist_sd =
1           slide_dbl(distance,
                     \(x) sd(x, na.rm = TRUE),
                     .before = 30, .after = 30),
2         dist_sd_low = dist_sd <= 5
         )

dataDist |> 
  gg_day(dist_sd, geom = "line") |> 
  gg_states(dist_sd_low, fill = "red") 
1
This function from the {slider} package apply a function on a sliding window (the standard deviation in our case). As we are working with 10-second data, we set the window size to 2x30, to look at 10 minute windows.
2
Try different values to check how the threshold affects the outcome.
Figure 11

Looking at the data, a threshold value of 5 for the dist_sdlooks sensible. We will not consider these values moving forward.

dataDist |> 
  add_clusters( 
1    distance >= 200 & distance < 600 & !dist_sd_low,
    cluster.duration = "30 mins", 
    interruption.duration = "1 min"
  ) |> 
  gg_day(y.axis = distance/10, 
         y.axis.label = "Distance (cm)", 
         geom = "line",
         y.scale = "identity", 
         y.axis.breaks = seq(0,100, by = 20)
         ) |> 
  gg_states(state, fill = "red",
2            ymin = 20, ymax = 60, alpha = 1) +
  geom_hline(yintercept = c(20, 60), col = "red", linetype = "dashed") +
  coord_cartesian(ylim = c(0,100))
1
additionally checking that it is not a time point with suspiciously low variation
2
Setting the height and opacity of the state-bar.
Figure 12: Example of continuous near-work episodes. Red shaded areas indicate periods of continuous near work (20–60 cm for ≥30 min, allowing ≤1 min interruptions and exluding times with low variation). Black trace is viewing distance over time; red dashed lines mark the 20 cm and 60 cm boundaries.

5.4.3 Near-work episodes

Beyond frequency, we can characterize near-work episodes by their duration and typical viewing distance. This section extracts all near-work episodes (using a shorter minimum duration to capture more routine near-work bouts) and summarizes three aspects: (1) frequency (count of episodes per day), (2) average duration of episodes, and (3) average distance during those episodes. These results are combined in Table 8.

dataDist |> 
  extract_clusters(
    distance >= 200 & distance < 600 & !dist_sd_low,
1    cluster.duration = "10 secs",
    interruption.duration = "20 secs",
    drop.empty.groups = FALSE
  ) |> 
2  extract_metric(dataDist, distance = mean(distance, na.rm = TRUE)) |>
  summarize_numeric(remove = c("start", "end", "epoch"), prefix = "",
                    add.total.duration = FALSE) |>  
3  mean_daily(prefix = "") |>
  gt() |> 
  fmt_number(c(distance, episodes), decimals = 0) |> 
  cols_units(distance = "cm")
1
Minimal duration to count as an episode (very short to capture all)
2
Calculate mean distance during each episode
3
Daily averages for each metric
Table 8: Near-work episodes: frequency, mean duration, and mean viewing distance
Date duration distance, cm episodes
VEET
Mean daily 148s (~2.47 minutes) 437 194
Weekday 151s (~2.52 minutes) 441 205
Weekend 139s (~2.32 minutes) 425 168

6 Conclusion

Congratulations! You have finished this section of the advanced course. If you go back to the homepage, you can select one of the other use cases.

Footnotes

  1. https://projectveet.com↩︎

  2. tibble are data.tables with tweaked behavior, ideal for a tidy analysis workflow. For more information, visit the documentation page for tibbles↩︎