Recording Metadata#

Introduction#

There are several metadata schemes currently being used to describe different types of neuroscience data but no single standard has emerged. For example, the Neurodata Without Borders (NWB) format that is becoming increasingly popular for storing neurophysiology and associated behavioural data generated by intracellular and extracellular electrophysiology experiments and optical physiology experiments also stores accompanying metadata in the same file. Moreover, the Brain Imaging Data Structure (BIDS) format widely used to structure neuroimaging and EEG data and associated behavioural and other data also has its own associated metadata standard. There are also efforts on the way to establish a more generic standard that could be suitable for any type of neuroscience data like the open metadataMarkup Language (odML).

NWB and BIDS metadata formats may be particularly suitable for certain neuroscience data and they are becoming increasingly popular in their respective areas. For this reason it is advisable to follow their standard if you are working with neurophysiology or neuroimaging data. You can also follow the odML format which is less descriptive. You may also be using your own custom templates to record metadata and saving it in Matlab or Python files. If you are converting your neurophysiology data to the NWB format, you would also be converting your metadata into this format as part of it. Whichever approach you are using is your own choice and we are not going to make strict recommendations on how you should record your metadata. As long as metadata is rich and hierarchically organised or convertible to hierarchical form (e.g., dataset/experiment/subject/session/trial), it should be sufficient.

There are, however, important advantages to avoiding custom approaches. Adopting a standard makes your data more accessible to your collaborators and other researchers. You would also increase your data interoperability and make it easier to integrate into a database. Moreover, adopting a minimal metadata schema at the dataset level would likely make your dataset more searchable and discoverable. Therefore, below we provide a brief description and resources for several emerging metadata standards.

Recording Metadata in NWB Format#

The advantage of using the NWB format is having your metadata stored inside the same files containing your data. Available entries allow you to record rich metadata regarding the experiment, subjects, sessions, recording instruments, trials, and distinct datatypes. For example, you can record rich metadata regarding individual silicon probe channels and individual units in the extracellular electrophysiology context.

The full description of available metadata classes and properties is given by the NWB Format Specification.

There is also a tutorial available online: Intracellular electrophysiology structured metadata in NWB

Note

This subsection is work in progress. It will be developed in parallel to the tutorials for different neuroscience data types.

Recording Metadata in BIDS Format#

BIDS format uses JSON and TSV files to record metadata. You can read more about the usage of these formats within the BIDS context here.

BIDS starter kit is available here.

The full BIDS documentation is available here.

Note

This subsection is work in progress. Currently not priority.

Recording Metadata in odML Format#

The open metadataMarkup Language (odML) is a comprehensive format that stores rich metadata at multiple description levels (e.g., dataset/experiment/subject/session/trial). Unlike NWB and BIDS however, it is intended to be broad enough to include all neuroscience data. Currently it is the only pure neuroscience metadata format with a schema that is not limited to the dataset description only. The downside of adopting this format is redundancy. If you are already organising your data according to NWB or BIDS schemas, you would be duplicating your efforts in terms of information storage, as well as the need to learn a new programming interface. If you have not adopted any of the two formats, then odML is worth considering.

The entire metadata framework is called odMLtables. It is based on the XML format but also supports JSON and YAML formats and can even be converted into tabular XLS and CSV formats. It has both Matlab and Python application programmming interfaces and a graphical user interface. There is both an online documentation and a tutorial explaining the basic uses of odMLtables for recording your metadata.

Recording Metadata in DANDI Schema#

The DANDI Schema provides a minimal metadata schema for NWB and BIDS datasets and is intended to enhance searchability and visibility of these datasets. The DANDI Schema uses JSON for Linking Data (JASON-LD) format to encode metadata and, therefore, provides both human and machine readability. The metadata files are typically generated during the process of uploading a repository onto the DANDI Archive which is a platform storing neuroscience data in both NWB and BIDS formats. Alternatively, metadata can be generated and validated using a Python programming interface available to download here. Currently this schema is limited to dandisets only and whether it gains bigger traction within the neuroscience community remains to be seen.

Recording Metadata in openMINDS Format#

The Open Metadata Initiative for Neuroscience Data Structures (openMINDS) is a minimal metadata schema for neuroscience datasets developed by the Human Brain Project and the EBRAINS data sharing platform. Its developement and adoption is intended to make neuroscience datasets easier to find, as well as expose other dataset level information. The openMINDS team has provided a Python package to make their metadata schemes programmable. The openMINDS project is in its early stage and, therefore, is still largely in development. It remains to be seen whether this metadata format gains wider acceptance among neuroscientists.