Give us your feedback

MANOLO EU Project Task-Agnostic Time-Series Data Quality Estimation Framework

NEWS
Tue 08 Jul 2025

MANOLO EU project partners  Demokritos “NCSR”worked closely with FDI to develop a task-agnostic framework for time-series data quality estimation. Together, they implemented ten distinct methods, both machine learning and statistical, to identify biased, noisy, inconsistent, or otherwise low-quality data, as well as data that may have been maliciously manipulated to contaminate the model during training.

Specifically, NCSR “D” contributed their expertise in machine learning approaches (e.g., Classifier, Predictor), while FDI provided specialized knowledge in traditional statistical techniques for data drift and anomaly detection (e.g., PCA, Adwin). Their collaboration focused on integrating these two approaches into a unified framework and evaluating their performance.

Being task-agnostic, this framework can be applied across different time-series data, regardless of the specific use case. The two partners applied the framework to the Bitbrain EEG dataset (brainwave recordings collected during sleep using wearable headbands) and evaluated its performance by comparing the results against Bitbrain’s own noise detection method and MNE filtering (signal processing technique that removes noise by isolating important frequency bands).

The results revealed that the noise estimations from the attention-based models outperformed those of all other methods. These models use attention mechanisms, a technique through which the system learns to focus on the most relevant parts of the input data to make decisions. For example, the proposed attention-based Predictor model detected noise by identifying parts of the input it learned to ignore when predicting future data-points. This model demonstrated the strongest performance overall.

The ability to spot low-quality data is particularly valuable in applications like healthcare, where noise can compromise the accuracy of research findings and diagnoses. Experts benefit from tools like this task-agnostic framework, which can be applied across different types of datasets, as it provides an automatic way to detect and flag unreliable segments. This supports their work and helps them make more informed clinical decisions.

The task-agnostic data quality estimation framework supports T2.2: Data Quality Estimation, part of the Data Inspection & Generation component within MANOLO WP2. This task is responsible for automatically estimating and annotating data quality, detecting anomalies and inconsistencies, and ensuring reliable data provenance within the MANOLO framework.

This outcome is significant for MANOLO because it:

  • Ensures high-quality, reliable datasets for training and evaluation.
  • Automates quality assessment, reducing manual inspection effort.
  • Allows flexible application across different datasets and use cases.
  • Moving forward, the partner’ objectives include refining the framework to enhance precision and reduce false positives, as well as extending its application to other types of time-series data beyond EEG.

This work benefits the EU community by enhancing the reliability and trustworthiness of data used in high-stakes domains such as healthcare, finance, and environmental monitoring.