IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article presents the architecture and technical implementation of a data engine addressing this conflict by integrating a new governance solution based on access control within a big data analytics pipeline. Our data engine enriches traditional components for data governance with an access control system that enforces access to data in a big data environment based on data transformations. Data are then used along the pipeline only after sanitization, protecting sensitive attributes before their usage, in an effort to facilitate the balance between protection and quality. The solution was tested in a real-world smart city scenario using the data of the Oslo city transportation system. Specifically, we compared the different predictive models trained with the data views obtained by applying the secure transformations required by different user roles to the same data set. The results show that the predictive models, built on data manipulated according to access control policies, are still effective.

Balancing Protection and Quality in Big Data Analytics Pipelines / A. Polimeno, P. Mignone, C. Braghin, M. Anisetti, M. Ceci, D. Malerba, C.A. Ardagna. - In: BIG DATA. - ISSN 2167-647X. - (2024). [Epub ahead of print] [10.1089/big.2023.0065]

Balancing Protection and Quality in Big Data Analytics Pipelines

A. Polimeno^Primo;Paolo Mignone;C. Braghin;M. Anisetti;Michelangelo Ceci;Donato Malerba;C.A. Ardagna^Ultimo

2024

Abstract

Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article presents the architecture and technical implementation of a data engine addressing this conflict by integrating a new governance solution based on access control within a big data analytics pipeline. Our data engine enriches traditional components for data governance with an access control system that enforces access to data in a big data environment based on data transformations. Data are then used along the pipeline only after sanitization, protecting sensitive attributes before their usage, in an effort to facilitate the balance between protection and quality. The solution was tested in a real-world smart city scenario using the data of the Oslo city transportation system. Specifically, we compared the different predictive models trained with the data views obtained by applying the secure transformations required by different user roles to the same data set. The results show that the predictive models, built on data manipulated according to access control policies, are still effective.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				access control; anomaly detection; big data; data governance; data protection
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Intelligent Management of Processes, Ethics and Technology for Urban Safety (IMPETUS)
								
	Acronimo
	
									IMPETUS
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									883286
								
	Titolo Progetto
	
									MUSA - Multilayered Urban Sustainability Actiona
								
	Acronimo
	
									MUSA
								
	Nome finanziatore
	
										MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
									
	Titolo Progetto
	
									SEcurity and RIghts in the CyberSpace (SERICS)
								
	Acronimo
	
									SERICS
								
	Nome finanziatore
	
										MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
									
	N. Contratto
	
									codice identificativo PE00000014
								
	Data di pubblicazione
	
				2024
			
	Data ahead of print o data di stampa
	
				11-apr-2024
			
	Rivista in ANCE
	
				BIG DATA
			
	DOI
	
				https://dx.doi.org/10.1089/big.2023.0065
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
Polimeno et al. - 2024 - Balancing Protection and Quality in Big Data Analy.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 602.65 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	602.65 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1062849

Citazioni

0

1

0

social impact