ZPID

EyeStore (Coming Soon)

A FAIR database and data sharing platform for eye-tracking-while reading data

eye-stockimage-scaled-e1670442638589-2048x1479

Explore Reading Data
Search and filter eye-tracking datasets across languages and studies. EyeStore currently provides access to the MultiplEYE corpus, a large multilingual dataset collected across many countries.

Share Your Dataset
Publish eye-tracking-while-reading data with standardized documentation.

Enable Reproducible Research
Access well-documented datasets ready for analysis.

EyeStore (funded by swissuniversities through the EyeStore and EyeStore+ projects) is a FAIR compliant database and data sharing platform dedicated to eye-tracking-while-reading data. It is designed to support Open Research Data (ORD) standards and promote best practices in the sharing, documentation, and reuse of eye-tracking data, ensuring that datasets are findable, accessible, interoperable, and reusable. 

Initially developed in close collaboration with the COST Action MultiplEYE (CA21131)  - a large-scale, interdisciplinary research network supported by the European Cooperation in Science and Technology (COST) and funded by the European Union – EyeStore hosts the MultiplEYE eye-tracking dataset as its initial contribution, while also welcoming additional datasets from the community. For more information on the broader MultiplEYE initiative, please visit the MultiplEYE website.

The platform features a user-friendly interface with metadata-driven search and advanced filtering options, allowing users to explore and extract data at three different levels: dataset (study), session (participant), and trial level (individual reading events). Researchers can, for example, select trials based on language, participant characteristics, or stimulus type, making it easy to tailor data selection to their specific research questions. This fine-grained filtering ensures targeted access to relevant subsets and facilitates meaningful comparisons across datasets. Filter criteria include not only demographic or linguistic parameters, but also data quality metrics and available stimulus materials. For each dataset, users are presented with a preview of data matching their search, along with the option to download the complete dataset. This includes the eye-movement data at different stages of the preprocessing pipeline (e.g. ASC, CSV), relevant metadata in both JSON and PDF formats, the experimental stimuli, corresponding linguistic annotations, and data quality reports.

The MultiplEYE eye-tracking data corpus

As its initial contribution, EyeStore hosts the MultiplEYE dataset, which includes eye-tracking data collected across various countries in Europe, as well as in the USA, Canada, Mexico, and Pakistan—each involving reading experiments in different languages. This extensive dataset is a cornerstone of EyeStore, demonstrating the repository's role in enabling large-scale, cross-linguistic research.

The MultiplEYE project covers a wide range of both high- and low-resource languages. Currently, the following languages are included in MultiplEYE: Albanian, Arabic, Basque, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, French, Farsi, German, Greek, West Greenlandic (Kalaallisut), Hebrew, Hindi, Italian, Latvian, Lithuanian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Romansh, Russian, Serbian, Slovenian, Spanish, Swedish, Turkish, Ukrainian, and Urdu.

About the MultiplEYE data collection

The MultiplEYE data corpus is a central component of EyeStore and serves as the platform’s initial large-scale data collection. The description below outlines how the data was collected and what types of data are available through EyeStore. (Related scientific publications documenting the methodology and data will be linked and cited as sons as they have become available.) 

The MultiplEYE eye-tracking-while-reading experiment

The MultiplEYE team has created a standardized eye-tracking-while-reading experiment designed for multilingual, cross-site data collection. The experiment can be implemented in various eye-tracking systems using Python, based on the psychopy and pygaze packages. All MultiplEYE contributors followed the official MultiplEYE data collection guidelines.

The experiment starts with the experimenter entering the participant ID into the experiment software. Subsequently, the participant is greeted by a welcome screen and prompted to electronically sign the consent form. Following this, the experiment instructions are displayed on the screen. Next, the camera setup is initiated, which includes calibration and validation processes. A practice phase commences, involving two short texts followed by two comprehension questions each. These practice trials serve to familiarize participants with the experimental setup and keyboard usage, as well as to provide practice with comprehension questions. The main phase of the reading experiment starts thereafter. Ten texts are presented in a (pseudo-)randomized order, with the randomization implemented in the experimental presentation software to ensure that each participant encounters a different sequence. Constraints are applied to balance text type, length, and the placement of the mandatory mid-session break. A fixation trigger precedes the turn to each new page, requiring participants to fixate on a dot on the screen to validate calibration accuracy and allow for a smooth and fast transition to the next page, if the calibration is still sufficiently precise. Re-calibration procedures follow if warranted to regain precision of the eye-tracking measurements. Three rating scales follow the presentation of each text: two inquiring about the participant’s familiarity with the text (rating the familiarity of the text and whether participants have read or listened to the text previously), and the last one inquiring about the perceived difficulty. Following the rating scales, participants are prompted to respond to six comprehension questions after having completely read each text. Re-visitations of the texts are disabled during the presentation of the comprehension questions. After the completion of the experiment phase, participants are asked to fill out a participant questionnaire on a screen

Collected data

The MultiplEYE data collection, as hosted and shared via EyeStore, includes a multilingual set of eye-tracking-while-reading datasets. The datasets are available in multiple stages of the data processing pipeline and are accompanied by rich documentation and metadata to ensure transparency, reusability, and alignment with FAIR principles. 

All datasets undergo a standardized preprocessing pipeline, which not only transforms the raw data into multiple usable formats but also extracts detailed data quality reports. These reports, along with other metadata, are included in the dataset and used as filter criteria within EyeStore.

  • Stage 1: Converted files in a human-readable format (e.g., .asc), with no modifications beyond encoding.
  • Stage 2: Parsed data files containing one sample per line (e.g., x/y-coordinates and timestamp), chronologically ordered.
  • Stage 3: Aggregated gaze events (e.g., fixations, saccades) derived from the raw samples.
  • Stage 4: Reading data where gaze events are mapped to individual words in the order they were presented in the text.

Additional collected data

  • Experiment metadata, including lab setup, hardware/software configuration, and calibration details.
  • Session-level metadata, detailing participant interaction, tracking quality, and recording specifics.
  • Stimuli and linguistic annotations, including text materials used and their linguistic properties.
  • Questionnaire data, such as: 
    • Participant demographics
    • Text familiarity and difficulty ratings
    • Comprehension question responses
  • Psychometric test data, when available: 
    • Working Memory Capacity (Lewandowsky et al.)
    • Rapid Automatized Naming (RAN)
    • Cognitive Control Tasks (Stroop, Flanker)
    • Metalinguistic Aptitude (PLAB)
    • Vocabulary Test (WikiVocab)

Note: Psychometric data collection was optional, and availability varies by dataset.

File Formats

Each dataset includes:

  • Eye-tracking data in .asc, and .csv formats (covering all processing stages)
  • Metadata and documentation in .json or .pdf formats, ensuring interpretability and reusability
  • Data quality reports (in .json or .csv), embedded in the metadata for transparency and filtering

Participants for the MultiplEYE data collection

Participants are native speakers of the language testes. They are adults (18 – 65 years old), literate, with normal or corrected-to-normal vision. All participants report no known or suspected reading or language disorders, intellectual disabilities, or psychiatric diagnoses.

Guidelines & Standards

To ensure consistency, transparency, and reusability of eye-tracking datasets, EyeStore follows a structured framework for data submission, metadata documentation, and quality control. This page provides researchers with essential guidelines for preparing their datasets for inclusion in EyeStore.

  • Data Submission Guidelines: Step-by-step instructions on how to contribute datasets, including required file formats, documentation, and metadata standards.
  • Metadata Standards: A detailed overview of the standardized metadata schema used in EyeStore, covering dataset-, session-, and trial-level information to facilitate searchability and interoperability.
  • Quality Control Measures: Criteria for assessing data quality, including calibration accuracy, validation performance, and preprocessing requirements.
  • Best Practices for Data Documentation: Recommendations for ensuring comprehensive and transparent dataset descriptions, facilitating long-term usability and reproducability.
cost_logo swissuniversities logo eu-emblem