Use case #04: Visual experience
Open and reproducible analysis of light exposure and visual experience data (Advanced)
1 Preface
Wearable devices can capture more besides illuminance or melanopic EDI. They can capture many aspects of the visual environment that, from an observer’s perspective, form the visual experience. In this use case we focus on a multimodal dataset, captured by the VEET device 1. It simultaneously captures from different sensors to collect light (ambient light sensor ALS), viewing distance (time of flight TOF), spectral information (spectral light channels, PHO), and motion and orientation (inertial measurements IMU). The data we are looking at do not come from a specific experiment, but are pilot data captured by an individual participant. The example is adapted from the online tutorial by Zauner et al. (2025).
The tutorial focuses on
import multimodal data
reconstruction of a spectral power distribution from sensor data
calculating spectrum-based metrics
simplifying a spatial grid of distance measurements
detecting clusters of conditions (for distance)
2 How this page works
This document contains the script for the online course series as a Quarto script, which can be executed on a local installation of R. Please ensure that all libraries are installed prior to running the script.
If you want to test LightLogR without installing R or the package, try the script version running webR, for a autonymous but slightly reduced version. These versions are indicated as (live), whereas the current version is considered (static), as it is pre-rendered.
To run this script, we recommend cloning or downloading the GitHub repository (link to Zip-file) and running the respective script. You will need to install the required packages. A quick way is to run:
3 Setup
We start by loading the necessary packages.
4 Spectrum
4.1 Import
The VEET device records multiple data modalities in one combined file. Its raw data file contains interleaved records for different sensor types, distinguished by a “modality” field. Here we focus first on the spectral sensor modality (PHO) for spectral irradiance data, and below under Section 5.1 the time-of-flight modality (TOF) for distance data.
In the VEET’s export file, each line includes a timestamp and a modality code, followed by fields specific to that modality. Importantly, this means that the VEET export is not rectangular, i.e., tabular. The non-rectangular structure of the VEET files can be seen in Figure 1 to the side. This makes it challenging for many import functions that expect the equal number of columns in every row, which is not the case in this instance. For the PHO (spectral) modality, the columns include a timestamp, integration time, a general Gain factor, and nine sensor channel readings covering different wavelengths (with names like s415, s445, ..., s680, s940) as well as a Dark channel and a broadband channels Clear. In essence, the VEET’s spectral sensor captures light in several wavelength bands (from ~415 nm up to 940 nm, plus an infrared and “clear” channel) rather than outputting a single lux value like the ambient light sensor does (PHO). Unlike illuminance, the spectral data are not directly given as directly interpretable radiometric metrics but rather as raw sensor counts across multiple wavelength channels, which require conversion to reconstruct a spectral power distribution. In our analysis, spectral data allow us to compute metrics like the relative contribution of short-wavelength (blue) light versus long-wavelength light in the participant’s environment. Processing this spectral data involves several necessary steps.
To import the VEET ambient light data, we use the LightLogR import function, specifying the PHO modality. The raw VEET data in our example is provided as a zip file (02_VEET_L.csv.zip) containing the logged data for about four days. We do the following:
- 1
-
modalityis a parameter only theVEETdevice requires. If uncertain, which devices require special parameters, have a look a the import help page (?import) under theVEETdevice. Setting it toPHOgives us the spectral modality. - 2
-
As we are only dealing with one individual here, we set a manual
Id - 3
-
In
LightLogR 0.10.0 High noon, which is the current version as this use case was written, this argument would not be necessary, as it is the default version. Things might change, however in the future. Setting this version argument ensures that the file will be read in correctly, even if the default file format evolves over time.
Multiple files in zip: reading '02_VEET_L.csv'
Successfully read in 173'013 observations across 1 Ids from 1 VEET-file(s).
Timezone set is US/Central.
The system timezone is Europe/Berlin. Please correct if necessary!
First Observation: 2025-06-17 12:25:13
Last Observation: 2025-06-21 22:47:01
Timespan: 4.4 days
Observation intervals:
Id interval.time n pct
1 VEET 1s 417 0.24102%
2 VEET 2s 171837 99.32086%
3 VEET 3s 738 0.42656%
4 VEET 4s 7 0.00405%
5 VEET 9s 1 0.00058%
6 VEET 11s 1 0.00058%
7 VEET 12s 2 0.00116%
8 VEET 13s 4 0.00231%
9 VEET 15s 1 0.00058%
10 VEET 16s 1 0.00058%
# ℹ 3 more rows
To check which version is available for a device (and what the default version is that is called when nothing is provided), check the following function.
| Device | Version | Default | Description |
|---|---|---|---|
| VEET | initial | FALSE | Device format as it was initially implemented in LightLogR. |
| VEET | 2.1.7 | TRUE | In firmware version 2.1.7 a change was introduced to the `PHO` modality, where only one `CLEAR` channel is exported instead of two. |
After import, data contains columns for the timestamp (Datetime), Gain (the sensor gain setting), and the nine spectral sensor channels plus a clear channel. These appear as numeric columns named s415, s445, ..., s940, Dark, Clear. Other columns are also present but not needed for now. The spectral sensor was logging at a 2-second rate. It is informative to look at a snippet of the imported spectral data before further processing. Table 1 shows three rows of data after import (before calibration), with some technical columns omitted for brevity:
| Datetime | integration_time | Gain | s415 | s445 | s480 | s515 | s555 | s590 | s630 | s680 | s910 | Dark | Clear |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| VEET | |||||||||||||
| 2025-06-18 00:11:22 | 100 | 512 | 20.00 | 27.00 | 31.00 | 42.00 | 47.00 | 63.00 | 90.00 | 139.00 | 756.00 | 0.00 | 534.00 |
| 2025-06-18 00:11:24 | 100 | 512 | 21.00 | 28.00 | 31.00 | 41.00 | 47.00 | 63.00 | 90.00 | 140.00 | 756.00 | 0.00 | 534.00 |
| 2025-06-18 00:11:26 | 100 | 512 | 21.00 | 27.00 | 31.00 | 41.00 | 47.00 | 63.00 | 90.00 | 139.00 | 756.00 | 0.00 | 534.00 |
| 2025-06-18 00:11:28 | 100 | 512 | 21.00 | 27.00 | 31.00 | 42.00 | 47.00 | 63.00 | 90.00 | 140.00 | 755.00 | 0.00 | 534.00 |
4.2 Spectral calibration
Now we proceed with spectral calibration. The VEET’s spectral sensor counts need to be converted to physical units (spectral irradiance) via a calibration matrix provided by the manufacturer. For this example, we assume we have a calibration matrix that maps all the channel readings to an estimated spectral power distribution (SPD). The LightLogR package provides a function spectral_reconstruction() to perform this conversion. However, before applying it, we must ensure the sensor counts are in a normalized form. This procedure is laid out by the manufacturer. In our version, we refer to the VEET SPD Reconstruction Guide.pdf, version 06/05/2025. Note that each manufacturer has to specify the method of count normalization (if any) and spectral reconstruction. In our raw data, each observation comes with a Gain setting that indicates how the sensor’s sensitivity was adjusted; we need to divide the raw counts by the gain to get normalized counts. LightLogR offers normalize_counts() for this purpose. We further need to scale by integration time (in milliseconds) and adjust depending on counts in the Dark sensor channel.
1count.columns <- c("s415", "s445", "s480", "s515", "s555", "s590", "s630",
"s680", "s910", "Dark", "Clear")
2gain.ratios <-
tibble(
gain = c(0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512),
gain.ratio =
c(0.008, 0.016, 0.032, 0.065, 0.125, 0.25, 0.5, 1, 2, 3.95, 7.75)
)
#normalize data:
data <-
data |>
3 mutate(across(c(s415:Clear), \(x) (x - Dark)/integration_time)) |>
4 normalize_counts(
5 gain.columns = rep("Gain", 11),
6 count.columns = count.columns,
7 gain.ratios
) |>
8 select(-c(s415:Clear)) |>
rename_with(~ str_remove(.x, ".normalized"))- 1
- Column names of variables that need to be normalized
- 2
- Gain ratios as specified by the manufacturer’s reconstruction guide
- 3
- Remove dark counts & scale by integration time
- 4
- Function to normalize counts
- 5
- All sensor channels share the gain value
- 6
- Sensor channels to normalize (see 1.)
- 7
- Gain ratios (see 2.)
- 8
- Drop original raw count columns
In this call, we specified gain.columns = rep("Gain", 11) because we have 11 sensor columns that all use the same gain factor column (Gain). This step will add new columns (with a suffix, e.g. .normalized) for each spectral channel representing the count normalized by the gain. We then dropped the raw count columns and renamed the normalized ones by dropping .normalized from the names. After this, data contains the normalized sensor readings for s415, s445, ..., s940, Dark, Clear for each 5-minute time point.
Because we do not need this high a resolution, we will aggregate it to a 5-minute interval for computational efficiency. The assumption is that spectral composition does not need to be examined at every 2-second instant for our purposes, and 5-minute averages can capture the general trends while drastically reducing data size and downstream computational costs.
# Aggregate spectral data to 5-minute intervals and mark gaps
data <- data |>
aggregate_Datetime(unit = "5 mins") |> # aggregate to 5-minute bins
gap_handler(full.days = TRUE) |> # explicit NA for any gaps in between
1 add_Date_col(group.by = TRUE) |>
2 remove_partial_data(Clear, threshold.missing = "-23 hour")- 1
- Group the data by calendar day
- 2
-
Remove days (i.e., groups) with less than 23 hours of available data. By setting the
threshold.missingto a negative number, we enforce minimum duration, which is great when the total duration of a group is not known or varies.
We use one of the channels (here Clear) as the reference variable for remove_partial_data to drop incomplete days (the choice of channel is arbitrary as all channels share the same level of completeness). The dataset is now ready for spectral reconstruction.
Please be aware that normalize_counts() requires the Gain values according to the gain table. If we had aggregated the data before normalizing it, Gain values would have been averaged within each bin (5 minutes in this case). If the Gain did not change in that time, it is not an issue. Any mix of Gain values will lead to a Gain value that is not represented in the gain table. While outputs for normalize_counts() are not wrong in these cases, they will output NA if a Gain value is not found in the table. Thus we recommend to always normalize counts based on the raw dataset.
4.3 Spectral reconstruction
For spectral reconstruction, we require a calibration matrix that corresponds to the VEET’s sensor channels. This matrix would typically be obtained from the device manufacturer or a calibration procedure. It defines how each channel’s normalized count relates to intensity at various wavelengths. For demonstration, the calibration matrix was provided by the manufacturer and is specific to the make and model (see Figure 3). It should not be used for research purposes without confirming its accuracy with the manufacturer.
#import calibration matrix
1calib_mtx <-
read_csv("data/visual_experience/VEET_calibration_matrix.csv",
show_col_types = FALSE) |>
column_to_rownames("wavelength")
# Reconstruct spectral power distribution (SPD) for each observation
data <-
data |>
mutate(
Spectrum =
2 spectral_reconstruction(
3 pick(s415:s910),
4 calibration_matrix = calib_mtx,
5 format = "long"
)
)- 1
-
Import the calibration matrix and make certain
wavelengthis set a rownames - 2
-
The function
spectral_reconstruction()does not work on the level of the dataset, but has to be called withinmutate()(or provided the data directly) - 3
- Pick the normalized sensor columns
- 4
- Provide the calibration matrix
- 5
- Return a long-form list column (wavelength, intensity)
Here, we use format = "long" so that the result for each observation is a list-column Spectrum, where each entry is a tibble2 containing two columns: wavelength and irradiance (one row per wavelength in the calibration matrix). In other words, each row of data now holds a full reconstructed spectrum in the Spectrum column. The long format is convenient for further calculations and plotting. (An alternative format = "wide" would add each wavelength as a separate column, but that is less practical when there are many wavelengths.)
To visualize the data we will calculate the photopic illuminance based on the spectra and plot each spectrum color-scaled by their illuminance. For clarity, we reduce the data to observations within the day that has the most observations (non-NA).
- 1
- Keep only observations for one day (with the lowest missing intervals)
- 2
- Use the spectrum,…
- 3
-
… call the function
spectral_integration()for each,… - 4
- … use the brightness sensitivity function,…
- 5
- … and apply the appropriate efficacy weight.
- 6
- Create a long format of the data where the spectrum is unnested
# A tibble: 288 × 4
# Groups: Id, Date [1]
Id Date Datetime Illuminance
<fct> <date> <dttm> <dbl>
1 VEET 2025-06-18 2025-06-18 00:00:00 3.70
2 VEET 2025-06-18 2025-06-18 00:05:00 3.79
3 VEET 2025-06-18 2025-06-18 00:10:00 3.72
4 VEET 2025-06-18 2025-06-18 00:15:00 6.90
5 VEET 2025-06-18 2025-06-18 00:20:00 3.53
6 VEET 2025-06-18 2025-06-18 00:25:00 3.26
7 VEET 2025-06-18 2025-06-18 00:30:00 3.59
8 VEET 2025-06-18 2025-06-18 00:35:00 3.49
9 VEET 2025-06-18 2025-06-18 00:40:00 3.54
10 VEET 2025-06-18 2025-06-18 00:45:00 3.62
# ℹ 278 more rows
The following plot visualizes the spectra:
data_spectra |>
ggplot(aes(x = wavelength,group = Datetime)) +
geom_line(aes(y = irradiance*1000, col = Illuminance)) +
labs(y = "Irradiance (mW/m²/nm)",
x = "Wavelength (nm)",
col = "Photopic illuminance (lx)") +
scale_color_viridis_b(breaks = c(0, 10^(0:3))) +
scale_y_continuous(trans = "symlog", breaks = c(0, 1, 10, 50)) +
coord_cartesian(ylim = c(0,NA), expand = FALSE) +
theme_minimal()The following ridgeline plot can be used to assess when in the day certain spectral wavelenghts are dominant:
data_spectra |>
ggplot(aes(x = wavelength, group = Datetime)) +
geom_ridgeline(aes(height = irradiance*1000,
y = Datetime,
fill = Illuminance),
scale = 400, lwd = 0.1, alpha = 0.7) +
labs(y = "Local time & Irradiance (mW/m²/nm)",
x = "Wavelength (nm)",
fill = "Photopic illuminance (lx)")+
scale_fill_viridis_b(breaks = c(0, 10^(0:3))) +
theme_minimal()At this stage, the data dataset has been processed to yield time-series of spectral power distributions. We can use these to compute biologically relevant light metrics. For instance, one possible metric is the proportion of power in short wavelengths versus long wavelengths.
With the VEET spectral preprocessing complete, we emphasize that these steps – normalizing by gain, applying calibration, and perhaps simplifying channels – are device-specific requirements. They ensure that the raw sensor counts are translated into meaningful physical measures (like spectral irradiance). Researchers using other spectral devices would follow a similar procedure, adjusting for their device’s particulars (some may output spectra directly, whereas others, like VEET, require reconstruction.
Some devices may output normalized counts instead of raw counts. For example, the ActLumus device outputs normalized counts, while the VEET device records raw counts and the gain. Manufacturers will be able to specify exact outputs for a given model and software version.
4.4 Metrics
Spectrum-based metrics in wearable data are relatively new and less established compared to distance or broadband light metrics. The following examples illustrate potential uses of spectral data in a theoretical sense, which can be adapted as needed for specific research questions.
4.4.1 Ratio of short- vs. long-wavelength light
Our first spectral metric is the ratio of short-wavelength light to long-wavelength light, which is relevant, for example, in assessing the blue-light content of exposure. For this example, we will arbitrarily define “short” wavelengths as 400–500 nm and “long” as 600–700 nm. Using the list-column of spectra in our dataset, we integrate each spectrum over these ranges (using spectral_integration()), and then compute the ratio short/long for each time interval. We then summarize these ratios per day.
- 1
- Focus on ID, date, time, and spectrum
- 2
- Integrate over the spectrum from 400 to 500 nm
- 3
- Integrate over the spectrum from 600 to 700 nm
Table 2 shows the average short/long wavelength ratio, averaged over each day (and then as weekday/weekend means if applicable). In this dataset, the values give an indication of the spectral balance of the light the individual was exposed to (higher values mean relatively more short-wavelength content).
| Date | short | long | sl ratio |
|---|---|---|---|
| VEET | |||
| 2025-06-18 | 83.7 | 71.7 | 1.168 |
| 2025-06-20 | 114.0 | 81.6 | 1.396 |
4.4.2 Melanopic daylight efficacy ratio (MDER)
The same idea is behind calculating the melanopic daylight efficacy ratio (or MDER), which is defined as the melanopic EDI divided by the photopic illuminance. Results are shown in Table 3. In this case, instead of a simple integration over a wavelength band, we apply an action spectrum to the spectral power distribution (SPD), integrate over the weighted SPD, and apply a correction factor. All of this is implemented in spectral_integration().
- 1
- Calculate melanopic EDI by applying the \(s_{mel (\lambda)}\) action spectrum, integrating, and weighing
- 2
- Calculate photopic illuminance by applying the \(V_{(\lambda)}\) action spectrum, integrating, and weighing
| Date | short | long | melEDI | illuminance | MDER |
|---|---|---|---|---|---|
| VEET | |||||
| 2025-06-18 | 83.7 | 71.7 | 80.6 | 103.7 | 0.777 |
| 2025-06-20 | 114.0 | 81.6 | 108.0 | 123.9 | 0.871 |
4.4.3 Short-wavelength light at specific times of day
The second spectral example examines short-wavelength light exposure as a function of time of day. Certain studies might be interested in, for instance, blue-light exposure during midday versus morning or night. We demonstrate three approaches: (a) filtering the data to a specific local time window, and (b) aggregating by hour of day to see a daily profile of short-wavelength exposure. Additionally, we (c) look at differences between day and night periods.
Table 4 isolates the time window between 7:00 and 11:00 each day and computes the average short-wavelength irradiance in that interval. This represents a straightforward query: “How much blue light does the subject get in the morning on average?”
data |>
1 filter_Time(start = "7:00:00", end = "11:00:00") |>
select(c(Id, Date, short)) |>
summarize_numeric(prefix = "") |>
gt() |>
fmt_number(short, scale_by = 1000) |>
cols_label(short = "Short-wavelength irradiance (mW/m²)") |>
cols_hide(episodes)- 1
- Filter data to local 7am–11am
| Date | Short-wavelength irradiance (mW/m²) |
|---|---|
| VEET | |
| 2025-06-18 | 5.44 |
| 2025-06-20 | 0.95 |
To visualize short-wavelength exposure over the course of a day, we aggregate the data into hourly bins. We cut the timeline into 1-hour segments (using local time), compute the mean short-wavelength irradiance in each hour for each day. Figure 6 shows the resulting diurnal profile, with short-wavelength exposure expressed as a fraction of the daily maximum for easier comparison.
- 1
- Bin timestamps by hour
- 2
- Add a Time column (hour of day, based on the cut/rounded data)
Adding missing grouping variables: `Datetime.rounded`
#creating the plot
data_time |>
ggplot(aes(x=Time, y = rel_short)) +
geom_col(aes(fill = factor(Date)), position = "dodge") +
ggsci::scale_fill_jco() +
theme_minimal() +
labs(y = "Normalized short-wavelength irradiance",
x = "Local time (HH:MM)",
fill = "Date") +
scale_y_continuous(labels = scales::label_percent()) +
scale_x_time(labels = scales::label_time(format = "%H:%M"))Finally, we compare short-wavelength exposure during daytime vs. nighttime. Using civil dawn and dusk information (based on geographic coordinates, here set for Houston, TX, USA), we label each measurement as day or night and then compute the total short-wavelength exposure in each period. Table 5 summarizes the daily short-wavelength dose received during the day vs. during the night.
1coordinates <- c(29.75, -95.36)
data |>
select(c(Id, Date, Datetime, short)) |>
2 add_photoperiod(coordinates) |>
group_by(photoperiod.state, .add = TRUE) |>
summarize_numeric(prefix = "",
remove = c("dawn", "dusk", "photoperiod", "Datetime")) |>
group_by(Id, photoperiod.state) |>
select(-episodes) |>
pivot_wider(names_from =photoperiod.state, values_from = short) |>
gt() |>
fmt_number(scale_by = 1000, decimals = 1) |>
tab_header("Average short-wavelength irradiance (mW/m²)")- 1
- Coordinates for Houston, Texas, USA
- 2
- Adding and grouping by photoperiod
| Average short-wavelength irradiance (mW/m²) | ||
| Date | day | night |
|---|---|---|
| VEET | ||
| 2025-06-18 | 126.5 | 12.3 |
| 2025-06-20 | 181.8 | 1.0 |
5 Distance
5.1 Import
In this second section, the distance data of the VEET device will be imported, analogous to the other modality. The TOF modality contains information for up to two objects in a 8x8 grid of measurements, spanning a total of about 52° vertically and 41° horizontally. Because the VEET device can detect up to two objects in a given grid point, and there is a confidence value assigned to every measurement, each observation contains \(2*2*8*8 = 256\) measurements.
- 1
-
modalityis a parameter only theVEETdevice requires. If uncertain, which devices require special parameters, have a look a the import help page (?import) under theVEETdevice. Setting it toTOFgives us the distance modality. - 2
-
As we are only dealing with one individual here, we set a manual
Id - 3
-
This file was captured with an older
VEETfirmware. Thus we setversion = "initial", so it reads it correctly.
Multiple files in zip: reading '01_VEET_L.csv'
Successfully read in 304'195 observations across 1 Ids from 1 VEET-file(s).
Timezone set is US/Central.
The system timezone is Europe/Berlin. Please correct if necessary!
First Observation: 2024-06-04 15:00:36
Last Observation: 2024-06-12 08:29:43
Timespan: 7.7 days
Observation intervals:
Id interval.time n pct
1 VEET 0s 3 0.00099%
2 VEET 1s 2089 0.68673%
3 VEET 2s 299876 98.58051%
4 VEET 3s 2213 0.72750%
5 VEET 4s 3 0.00099%
6 VEET 6s 1 0.00033%
7 VEET 9s 7 0.00230%
8 VEET 109s (~1.82 minutes) 1 0.00033%
9 VEET 59077s (~16.41 hours) 1 0.00033%
In a first step, we condition the data similarly to the other VEET modality. For computational reasons of the use cases, we remove the second object and set the interval to 10 seconds. Note that the next step still takes considerable computation time.
# Aggregate distance data to 10-second intervals and mark gaps
dataDist <-
dataDist |>
1 select(-contains(c("conf2_", "dist2_"))) |>
2 aggregate_Datetime(unit = "10 secs") |>
3 gap_handler(full.days = TRUE) |>
add_Date_col(group.by = TRUE) |>
remove_partial_data(dist1_0, threshold.missing = "1 hour")
dataDist |> summary_overview(dist1_0)- 1
- Remove the second object (for computational reasons)
- 2
- Aggregate to 10-second bins
- 3
- Explicit NA for any gaps
# A tibble: 4 × 4
name mean min max
<chr> <dbl> <dbl> <dbl>
1 Participants 1 NA NA
2 Participant-days 6 6 6
3 Days ≥80% complete 6 6 6
4 Missing/Irregular 0 0 0
In the next step, we need to transform the wide format of the imported dataset into a long format, where each row contains exactly one observation for one grid-point.
In a final step before we can use the data in the analysis, we need to assign x and y coordinates based on the position column that was created when pivoting longer. Positions are counted from 0 to 63 starting at the top right and increasing towards the left, before continuing on the right in the next row below. y positions thus depend on the row count, i.e., how often a row of 8 values fits into the position column. x positions consequently depend on the position within each 8-value row. We also add an observation variable that increases by +1 every time, the position column hits 0. We then center both x and y coordinates to receive meaningful values, i.e., 0° indicates the center of the overall measurement cone. Lastly, we convert both confidence columns, which are scaled from 0-255 into percentages by dividing them by 255. Empirical data from the manufacturer points to a threshold of about 10%, under which the respective distance data is not reliable.
- 1
-
Increment the y position for every 8 steps in
position - 2
-
Center
y.posand rescale it to cover 52° across 8 steps - 3
-
Increment the x position for every step in
position, resetting every 8 steps - 4
-
Center
x.posand rescale it to cover 41° across 8 steps - 5
-
Increase an observation counter every time we restart with
positionat 0 - 6
- Scale the confidence columns so that 255 = 100%
Now this dataset is ready for further analysis. We finish by visualizing the same observation time on different days. Note that we replace zero distance values with Infinity, as these indicate measurements outside the 5m measurement radius of the device.
1extras <- list(
geom_tile(),
scale_fill_viridis_c(direction = -1, limits = c(0, 200),
oob = scales::oob_squish_any),
scale_color_manual(values = c("black", "white")),
theme_minimal(),
guides(colour = "none"),
geom_text(aes(label = (dist1/10) |> round(0), colour = dist1>1000),
size = 2.5),
coord_fixed(),
labs(x = "X position (°)", y = "Y position (°)",
fill = "Distance (cm)"))
2slicer <- function(x){seq(min((x-1)*64+1), max(x*64, by = 1))}
dataDist |>
3 slice(slicer(4765)) |>
4 mutate(dist1 = ifelse(dist1 == 0, Inf, dist1)) |>
5 filter(conf1 >= 0.1 | dist1 == Inf) |>
6 ggplot(aes(x=x.pos, y=y.pos, fill = dist1/10))+ extras +
7 facet_grid(~Datetime)- 1
- Set visualization parameters
- 2
- Allows to choose an observation
- 3
- Choose a particular observation
- 4
- Replace 0 distances with Infinity
- 5
- Remove data that has less than 10% confidence
- 6
- Plot the data
- 7
- Show one plot per day
As we can see from the figure, different days have - at a given time - a vastly different distribution of distance data, and measurement confidence (values with confidence < 10% are removed)
5.2 Distance with spatial distribution
Some devices output a singular measure for distance (e.g., the Clouclip). The visual environment in natural conditions contains many distances, depending on the solid angle and direction of the measurement. A device like the VEET increases the spatial resolution of these measurements, allowing for more in-depth analyses of the size and position of an object within the field of view. In the case of the VEET, data are collected from an 8x8 measurement grid, spanning 52° vertically and 41° horizontally.
There are many ways how a spatially resolved distance measure could be utilized for analysis:
- Where in the field of view are objects in close range
- How large are near objects in the field of view
- How varied are distances within the field of view
- How close are objects / is viewing distance in a region of interest within the field of view
In view of the scope of these use cases, we will focus on the last point, but all aspects could be tackled with the available data. As these are still time-series data, the change of these aspects over time is also a relevant aspect.
5.3 Distance in a region of interest
We will reduce the dataset to (high confidence) values at or around a given grid position, e.g., ±10 degrees around the central view (0°).
dataDist <-
dataDist |>
1 filter(conf1 >= 0.1,
between(x.pos, -10,10),
between(y.pos, -10,10)) |>
2 group_by(Datetime, .add = TRUE) |>
summarize(
3 distance = mean(dist1),
4 n = n(),
.groups = "drop_last"
)
dataDist |>
5 filter_Date(start = "2024-06-10", length = "1 day") |>
6 aggregate_Datetime("15 mins", numeric.handler = \(x) mean(x, na.rm = TRUE)) |>
7 remove_partial_data(by.date = TRUE) |>
8 gg_day(y.axis = distance/10,
geom = "line",
linewidth = 1,
alpha = 0.75,
y.scale = "identity",
y.axis.breaks = seq(0,150, by = 20),
y.axis.label = "Distance (cm)"
)- 1
- Remove data with low confidence, and outside ±10°
- 2
-
Group by every observation. While this grouping is dropped in the next step it is crucial to derive at one value per time point (and
Id). - 3
- Calculate central distance
- 4
- Number of (valid) grid points
- 5
- Filter one day
- 6
- Create 15 minute data
- 7
- Remove midnight data points
- 8
-
Setting up the plot for
distance. It is a nice example showcasing the many sensible defaultsgg_day()has for light variables, compared to other modalities
5.4 Distance-based metrics
The following distance-based metrics use LightLogR functions to derive at the results. They can be mapped broadly 1:1 to light metrics, provided that thresholds are adjusted for light.
5.4.1 Daily duration of near work
- 1
- Consider only distances in [10, 60) cm
- 2
- Total duration in that range per day
- 3
- Summary across weekday and weekend
- 4
- Can be used to check the weekday types
| Date | average_duration |
|---|---|
| VEET | |
| Mean daily | 38595s (~10.72 hours) |
| Weekday | 37135s (~10.32 hours) |
| Weekend | 42245s (~11.73 hours) |
5.4.2 Frequency of continuous near work
Continuous near-work is typically defined as sustained viewing within a near distance for some minimum duration, allowing only brief interruptions. We use LightLogR’s cluster function to identify episodes of continuous near work. Here we define a near-work episode as viewing distance between 20 cm and 60 cm that lasts at least 30 minutes, with interruptions of up to 1 minute allowed (meaning short breaks ≤1 min do not end the episode). Using extract_clusters() with those parameters, we count how many such episodes occur per day.
Table 7 summarizes the average frequency of continuous near-work episodes per day, and Figure 10 provides an example visualization of these episodes on the distance time series.
dataDist |>
extract_clusters(
1 distance >= 200 & distance < 600,
2 cluster.duration = "30 mins",
3 interruption.duration = "1 min",
4 drop.empty.groups = FALSE
) |>
5 summarize_numeric(remove = c("start", "end", "epoch", "duration"),
add.total.duration = FALSE) |>
6 mean_daily(prefix = "Frequency of ") |>
gt() |>
fmt_number() - 1
- Condition: near-work distance
- 2
- Minimum duration of a continuous episode
- 3
- Maximum gap allowed within an episode
- 4
- Keep days with zero episodes in output
- 5
- Count number of episodes per day
- 6
- Compute daily mean frequency
| Date | Frequency of episodes |
|---|---|
| VEET | |
| Mean daily | 3.86 |
| Weekday | 4.00 |
| Weekend | 3.50 |
dataDist |>
1 add_clusters(
distance >= 200 & distance < 600,
cluster.duration = "30 mins",
interruption.duration = "1 min"
) |>
gg_day(y.axis = distance/10,
y.axis.label = "Distance (cm)",
geom = "line",
y.scale = "identity",
y.axis.breaks = seq(0,100, by = 20)
) |>
2 gg_states(state, fill = "red") +
geom_hline(yintercept = c(20, 60), col = "red", linetype = "dashed") +
coord_cartesian(ylim = c(0,100))- 1
- This function does not extract the clusters from the dataset, but rather annotates the dataset.
- 2
- Add state bands
The last figure showcases a serious issue. During the nighttime, when the device is not worn, it records distance values in the area of interest. To get rid of these values, we remove times with low variance, based on a sliding window of 10 minutes. Let us first determine what such a value would look like.
- 1
-
This function from the
{slider}package apply a function on a sliding window (the standard deviation in our case). As we are working with 10-second data, we set the window size to 2x30, to look at 10 minute windows. - 2
- Try different values to check how the threshold affects the outcome.
Looking at the data, a threshold value of 5 for the dist_sdlooks sensible. We will not consider these values moving forward.
dataDist |>
add_clusters(
1 distance >= 200 & distance < 600 & !dist_sd_low,
cluster.duration = "30 mins",
interruption.duration = "1 min"
) |>
gg_day(y.axis = distance/10,
y.axis.label = "Distance (cm)",
geom = "line",
y.scale = "identity",
y.axis.breaks = seq(0,100, by = 20)
) |>
gg_states(state, fill = "red",
2 ymin = 20, ymax = 60, alpha = 1) +
geom_hline(yintercept = c(20, 60), col = "red", linetype = "dashed") +
coord_cartesian(ylim = c(0,100))- 1
- additionally checking that it is not a time point with suspiciously low variation
- 2
- Setting the height and opacity of the state-bar.
5.4.3 Near-work episodes
Beyond frequency, we can characterize near-work episodes by their duration and typical viewing distance. This section extracts all near-work episodes (using a shorter minimum duration to capture more routine near-work bouts) and summarizes three aspects: (1) frequency (count of episodes per day), (2) average duration of episodes, and (3) average distance during those episodes. These results are combined in Table 8.
dataDist |>
extract_clusters(
distance >= 200 & distance < 600 & !dist_sd_low,
1 cluster.duration = "10 secs",
interruption.duration = "20 secs",
drop.empty.groups = FALSE
) |>
2 extract_metric(dataDist, distance = mean(distance, na.rm = TRUE)) |>
summarize_numeric(remove = c("start", "end", "epoch"), prefix = "",
add.total.duration = FALSE) |>
3 mean_daily(prefix = "") |>
gt() |>
fmt_number(c(distance, episodes), decimals = 0) |>
cols_units(distance = "cm")- 1
- Minimal duration to count as an episode (very short to capture all)
- 2
- Calculate mean distance during each episode
- 3
- Daily averages for each metric
| Date | duration | distance, cm | episodes |
|---|---|---|---|
| VEET | |||
| Mean daily | 148s (~2.47 minutes) | 437 | 194 |
| Weekday | 151s (~2.52 minutes) | 441 | 205 |
| Weekend | 139s (~2.32 minutes) | 425 | 168 |
6 Conclusion
Congratulations! You have finished this section of the advanced course. If you go back to the homepage, you can select one of the other use cases.













