Folder structure

Scope

The following presents a standardised way for naming and organising project code, materials and data. The level of this specification is at the level of folders and files, which exist on local drives, network drives, in repositories and in data releases (e.g. those on FigShare).

Definition

Principles

The folder structure definition inherits several key naming conventions, including for Participant ID, Project Name and Timestamps.

Definitions

An experimental session is a repeatable instance of a laboratory visit, such as a one-evening experiment. Each experimental session will have its own folder named ##_expsession, where ## is the running number. We consider screening visits an experimental session and place them in the screening folder. There are also data modalities that do not conform to this definition of a session, e.g. actigraphy or sleep diary measurements, which cannot be linked to a specific session. These are

A block is repeatable instance of a collection of different tests.

A test is a repeatable test, e.g. saliva or PVT, questionnaires. A test can be a collection of trials.

Within a test, individual trials may occur, which are repeatable instance of a specific data collection unit, e.g. reaction time stimulus.

fMRI-specific definitions

Within the fMRI world, there are specific terms that we also use for consistency and clarity.

An fMRI session is a specific instance of participant entering the scanner.

Within an fMRI session, a participant will complete several runs, e.g. T1 structural scans or BOLD scans.

Overall structure

The overall folder structure is defined as follows. The following are well-defined placeholders: $ProjectID, $ParticipantID. $_repo corresponds to the project-specific name of specific codes.

$ProjectID/
	ethics/
	code/
		$_repo		
	data*/
		derivatives/
		outputs/
		raw.csv
		raw/
			$ParticipantID/
	docs/
	materials/
		questionnaires/
		sops/
	outputs/
	README.md
	reports/ 
		posters/
		presentations/
		manuscripts/
  

README.md

The file README.md contains information about the project, including the author.

ethics folder

The ethics/ folder contains documentation about ethical approvals, including the full ethical application, approval letters and recruitment material. Any iterations of ethics should be included in this folder.

code/ folder

The code/ folder contains code used to run the experiment, analyse data, and other snippets of code to make the experiment reproducible. Depending on the project requirements, it is recommended to have separate GitHub repositories for different bits and pieces, e.g. separate out code to run the experiment from code to run the analysis. In any event, all code should be version-controlled on GitHub.

The code/ folder may also include notebooks for reproducible analyses, e.g. Jupyter notebooks.

data*/ folder

The data*/ folder contains all data collected in the project. This includes raw, processed and derived data. The raw/ data folder is included to be organised by participants. Depending on the project needs, the folder structure of derivatives can be more loosely populated. We consider derivatives data that are one step away or multiple steps from the raw data, e.g. manually cleaned or preprocessed data. The outputs/ folder contains any outputs directly generated from the data that are useful but not meant for publication.

The data/ folder can be accompanied by a sufix that differentiates between data collected at different stages or under different experimental protocols. For example data_pilot/ and data_main/. There can be as many data folders as needed, as long as the prefix of data*/ is maintained, and each folder follows the same underlying structure or pattern.

Data in in the data/ folder follow the following pattern:

$ProjectID/data*/<processing step>/$ParticipantID/<session number>_expsession/<modality>/<block number>_<tests>[-<test number>]_<timestamp>.<file_extension>

The raw.csv file contains an overview of the data collected and available in the data/ folder. This is to keep an inventory of the data collected.

The data/raw/ folder is organised with the following subfolders:

data/raw/screening/
data/raw/continuous/
data/raw/01_expsession/
data/raw/02_expsession/
data/raw/03_expsession/
...
data/raw/##_expsession##/
data/raw/group/

The group/ folder contains any data that are collected and only available at the group level and not at the individual-participant level. This includes, for example, data from REDCap or from devices that only collect data from multiple participants.

As an example:

CiViBe/
	data/
		derivatives/ 
		raw/
			101
				screening/ 
					metropsis/
						01_metropsis_<timestamp>/
					oct/	
						01_oct_<timestamp>/
								01_oct_<timestamp>.metadata.txt
								01_oct_<timestamp>.dicom
								01_oct_<timestamp>.csv
				continuous/
					metadata.txt
					actigraphy/
						01_actigraphy_<timestamp>.txt
						01_actigraphy_<timestamp>.metadata.txt
					sleepdiary/
						01_sleepdiary_<timestamp>.txt
						01_sleepdiary_<timestamp>.metadata.txt
				01_expsession/
					log_<timestamp>.log <- Check (session-wise log)
					metadata.txt <- 



					meta-data
					resources/ <- Optional
						00_beep.wav
						00_stimulus_sequences.csv
					pvt/
						01_pvt01_<timestamp>.csv <- Check if block is 2 numbers, then string, then timestamp
					01_pvt01_<timestamp>.csv
						01_pvt01_<timestamp>.log (test-wise log)
					oct/
						<block>_<test>-<number>_<timestamp>.<filetype>		
						01_cornealthickness_<timestamp>.metadata.txt
						01_cornealthickness_<timestamp>.dicom
						01_cornealthickness_<timestamp>.csv
						01_macula_<timestamp>.metadata.txt
						01_macula_<timestamp>.dicom
						01_macula_<timestamp>.csv
						02_macula_<timestamp>.metadata.txt
						02_macula_<timestamp>.dicom
						02_macula_<timestamp>.csv

The screening folder contains all information related to the screening session.

Timestamps follow the Timestamps convention.

docs/ folder

The docs/ folder contains documentation related to the project.

materials/ folder

The materials/ folder contains questionnaires, SOPs and other materials to reproduce the data collectin effort.

outputs/ folder

The outputs/ folder contains figures, tables and other data outputs related to the project that will be used in publications and other external documents. Note that the data/ folder also contains an outputs/ folder which contains intermediate figures.

reports/ folder

The reports/ folder contains any published outputs related to the project, including posters (in posters/), presentation decks (in presentations/) and manuscripts (/manuscripts/).

Creating an empty structure

To create an empty folder, you can run the following commands in Terminal (OS X & Linux):

mkdir EMPTY_PROJECT
mkdir EMPTY_PROJECT/ethics
mkdir EMPTY_PROJECT/code
mkdir EMPTY_PROJECT/data
mkdir EMPTY_PROJECT/data/derivatives
mkdir EMPTY_PROJECT/data/raw
mkdir EMPTY_PROJECT/data/raw/101
mkdir EMPTY_PROJECT/data/raw/101/screening
mkdir EMPTY_PROJECT/data/raw/101/continuous
mkdir EMPTY_PROJECT/data/raw/101/01_expsession
mkdir EMPTY_PROJECT/data/raw/101/01_expsession/meas/
mkdir EMPTY_PROJECT/docs
mkdir EMPTY_PROJECT/notebooks
mkdir EMPTY_PROJECT/materials
mkdir EMPTY_PROJECT/materials/questionnaires
mkdir EMPTY_PROJECT/materials/sops
mkdir EMPTY_PROJECT/outputs
touch EMPTY_PROJECT/README.md