EyeStore (Coming Soon)

A FAIR database and data sharing platform for eye-tracking-while reading data

eye-stockimage-scaled-e1670442638589-2048x1479

EyeStore (funded by swissuniversities through the EyeStore and EyeStore+ projects) is a FAIR compliant database and data sharing platform dedicated to eye-tracking-while-reading data. It is designed to support Open Research Data (ORD) standards and promote best practices in the sharing, documentation, and reuse of eye-tracking data, ensuring that datasets are findable, accessible, interoperable, and reusable.

Initially developed in close collaboration with the COST Action MultiplEYE (CA21131) - a large-scale, interdisciplinary research network supported by the European Cooperation in Science and Technology (COST) and funded by the European Union – EyeStore hosts the MultiplEYE eye-tracking dataset as its initial contribution, while also welcoming additional datasets from the community. For more information on the broader MultiplEYE initiative, please visit the MultiplEYE website.

The platform features a user-friendly interface with metadata-driven search and advanced filtering options, allowing users to explore and extract data at three different levels: dataset (study), session (participant), and trial level (individual reading events). Researchers can, for example, select trials based on language, participant characteristics, or stimulus type, making it easy to tailor data selection to their specific research questions. This fine-grained filtering ensures targeted access to relevant subsets and facilitates meaningful comparisons across datasets. Filter criteria include not only demographic or linguistic parameters, but also data quality metrics and available stimulus materials. For each dataset, users are presented with a preview of data matching their search, along with the option to download the complete dataset. This includes the eye-movement data at different stages of the preprocessing pipeline (e.g. EDF, ASC, CSV), relevant metadata in both JSON and PDF formats, the experimental stimuli, corresponding linguistic annotations, and data quality reports.

The MultiplEYE eye-tracking data corpus

As its initial contribution, EyeStore hosts the MultiplEYE dataset, which includes eye-tracking data collected across various countries in Europe, as well as in the USA, Canada, Mexico, and Pakistan—each involving reading experiments in different languages. This extensive dataset is a cornerstone of EyeStore, demonstrating the repository's role in enabling large-scale, cross-linguistic research.

The MultiplEYE project covers a wide range of both high- and low-resource languages. Currently, the following languages are included in MultiplEYE: Albanian, Arabic, Basque, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, French, Farsi, German, Greek, West Greenlandic (Kalaallisut), Hebrew, Hindi, Italian, Latvian, Lithuanian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Romansh, Russian, Serbian, Slovenian, Spanish, Swedish, Turkish, Ukrainian, and Urdu.

About the MultiplEYE data collection

The MultiplEYE data corpus is a central component of EyeStore and serves as the platform’s initial large-scale data collection. The description below outlines how the data was collected and what types of data are available through EyeStore. (Related scientific publications documenting the methodology and data will be linked and cited as sons as they have become available.)

The MultiplEYE eye-tracking-while-reading experiment

The MultiplEYE team has created a standardized eye-tracking-while-reading experiment designed for multilingual, cross-site data collection. The experiment can be implemented in various eye-tracking systems using Python, based on the psychopy and pygaze packages. All MultiplEYE contributors followed the official MultiplEYE data collection guidelines.

Experiment procedure: Before being seated at the eye-tracker and presentation PC, participants are welcomed and sign an informed consent form. Once seated, they are introduced to the experiment with on-screen instructions. The camera setup follows, including calibration and validation. A short practice phase with two texts and corresponding comprehension questions helps participants get familiar with the procedure. The main experiment includes ten texts, presented in randomized order. Before each text, a fixation trigger ensures consistent calibration. After reading each text, participants complete three rating scales on familiarity and difficulty, followed by six comprehension questions. Revisiting the text during this phase is not possible. The session concludes with a digital questionnaire collecting demographic and background information.

Collected data

The MultiplEYE data collection, as hosted and shared via EyeStore, includes a multilingual set of eye-tracking-while-reading datasets. The datasets are available in multiple stages of the data processing pipeline and are accompanied by rich documentation and metadata to ensure transparency, reusability, and alignment with FAIR principles.

All datasets undergo a standardized preprocessing pipeline, which not only transforms the raw data into multiple usable formats but also extracts detailed data quality reports. These reports, along with other metadata, are included in the dataset and used as filter criteria within EyeStore.

Stage 0: Raw, non-human readable files directly produced by the eye-tracker (e.g., .edf).
Stage 1: Converted files in a human-readable format (e.g., .asc), with no modifications beyond encoding.
Stage 2: Parsed data files containing one sample per line (e.g., x/y-coordinates and timestamp), chronologically ordered.
Stage 3: Aggregated gaze events (e.g., fixations, saccades) derived from the raw samples.
Stage 4: Reading data where gaze events are mapped to individual words in the order they were presented in the text.

Additional collected data

Experiment metadata, including lab setup, hardware/software configuration, and calibration details.
Session-level metadata, detailing participant interaction, tracking quality, and recording specifics.
Stimuli and linguistic annotations, including text materials used and their linguistic properties.
Questionnaire data, such as:
- Participant demographics
- Text familiarity and difficulty ratings
- Comprehension question responses
Psychometric test data, when available:
- Working Memory Capacity (Lewandowsky et al.)
- Rapid Automatized Naming (RAN)
- Cognitive Control Tasks (Stroop, Flanker)
- Metalinguistic Aptitude (PLAB)
- Vocabulary Test (WikiVocab)

Note: Psychometric data collection was optional, and availability varies by dataset.

File Formats

Each dataset includes:

Eye-tracking data in .edf, .asc, and .csv formats (covering all processing stages)
Metadata and documentation in .json and .pdf formats, ensuring interpretability and reusability
Data quality reports (in .json or .csv), embedded in the metadata for transparency and filtering

Participants for the MultiplEYE data collection

Participants are native speakers of the language testes. They are adults (18 – 65 years old), literate, with normal or corrected-to-normal vision. All participants report no known or suspected reading or language disorders, intellectual disabilities, or psychiatric diagnoses.