Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article presents the architecture and technical implementation of a data engine addressing this conflict by integrating a new governance solution based on access control within a big data analytics pipeline. Our data engine enriches traditional components for data governance with an access control system that enforces access to data in a big data environment based on data transformations. Data are then used along the pipeline only after sanitization, protecting sensitive attributes before their usage, in an effort to facilitate the balance between protection and quality. The solution was tested in a real-world smart city scenario using the data of the Oslo city transportation system. Specifically, we compared the different predictive models trained with the data views obtained by applying the secure transformations required by different user roles to the same data set. The results show that the predictive models, built on data manipulated according to access control policies, are still effective.

Balancing Protection and Quality in Big Data Analytics Pipelines / A. Polimeno, P. Mignone, C. Braghin, M. Anisetti, M. Ceci, D. Malerba, C.A. Ardagna. - In: BIG DATA. - ISSN 2167-647X. - (2024). [Epub ahead of print] [10.1089/big.2023.0065]

Balancing Protection and Quality in Big Data Analytics Pipelines

A. Polimeno
Primo
;
C. Braghin;M. Anisetti;C.A. Ardagna
Ultimo
2024

Abstract

Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article presents the architecture and technical implementation of a data engine addressing this conflict by integrating a new governance solution based on access control within a big data analytics pipeline. Our data engine enriches traditional components for data governance with an access control system that enforces access to data in a big data environment based on data transformations. Data are then used along the pipeline only after sanitization, protecting sensitive attributes before their usage, in an effort to facilitate the balance between protection and quality. The solution was tested in a real-world smart city scenario using the data of the Oslo city transportation system. Specifically, we compared the different predictive models trained with the data views obtained by applying the secure transformations required by different user roles to the same data set. The results show that the predictive models, built on data manipulated according to access control policies, are still effective.
access control; anomaly detection; big data; data governance; data protection
Settore INF/01 - Informatica
   Intelligent Management of Processes, Ethics and Technology for Urban Safety (IMPETUS)
   IMPETUS
   EUROPEAN COMMISSION
   H2020
   883286

   MUSA - Multilayered Urban Sustainability Actiona
   MUSA
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA

   SEcurity and RIghts in the CyberSpace (SERICS)
   SERICS
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   codice identificativo PE00000014
2024
11-apr-2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
Balancing Protection and Quality in Big Data Analytics Pipelines.pdf

accesso riservato

Descrizione: Article
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 732 kB
Formato Adobe PDF
732 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1062849
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact