The growing capacity to handle vast amounts of data, combined with a shift in ser- vice delivery models, has improved scalability and efficiency in data analytics, par- ticularly in multi-tenant environments. Data are treated as digital products and pro- cessed through orchestrated service-based data pipelines. However, advancements in data analytics do not find a counterpart in data governance techniques, leaving a gap in the effective management of data throughout the pipeline lifecycle. This gap highlights the need for innovative service-based data pipeline management solutions that prioritize balancing data quality and data protection. The framework proposed in this paper optimizes service selection and composition within service- based data pipelines to maximize data quality while ensuring compliance with data protection requirements, expressed as access control policies. Given the NP- hard nature of the problem, a sliding-window heuristic is defined and evaluated against the exhaustive approach and a baseline modeling the state of the art. Our results demonstrate a significant reduction in computational overhead, while maintain- ing high data quality.
Maximizing data quality while ensuring data protection in service-based data pipelines / A. Polimeno, C. Braghin, M. Anisetti, C.A. Ardagna. - In: JOURNAL OF BIG DATA. - ISSN 2196-1115. - 12:1(2025 Dec), pp. 62.1-62.34. [10.1186/s40537-025-01118-5]
Maximizing data quality while ensuring data protection in service-based data pipelines
A. PolimenoPrimo
;C. BraghinSecondo
;M. AnisettiPenultimo
;C.A. Ardagna
Ultimo
2025
Abstract
The growing capacity to handle vast amounts of data, combined with a shift in ser- vice delivery models, has improved scalability and efficiency in data analytics, par- ticularly in multi-tenant environments. Data are treated as digital products and pro- cessed through orchestrated service-based data pipelines. However, advancements in data analytics do not find a counterpart in data governance techniques, leaving a gap in the effective management of data throughout the pipeline lifecycle. This gap highlights the need for innovative service-based data pipeline management solutions that prioritize balancing data quality and data protection. The framework proposed in this paper optimizes service selection and composition within service- based data pipelines to maximize data quality while ensuring compliance with data protection requirements, expressed as access control policies. Given the NP- hard nature of the problem, a sliding-window heuristic is defined and evaluated against the exhaustive approach and a baseline modeling the state of the art. Our results demonstrate a significant reduction in computational overhead, while maintain- ing high data quality.File | Dimensione | Formato | |
---|---|---|---|
s40537-025-01118-5.pdf
accesso aperto
Descrizione: Research
Tipologia:
Publisher's version/PDF
Dimensione
2.93 MB
Formato
Adobe PDF
|
2.93 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.