k-Anonymity and l-diversity are two well-known privacy metrics that guarantee protection of the respondents of a dataset by obfuscating information that can disclose their identities and sensitive information. Existing solutions for enforcing them implicitly assume to operate in a centralized scenario, since they require complete visibility over the dataset to be anonymized, and can therefore have limited applicability in anonymizing large datasets. In this paper, we propose a solution that extends Mondrian (an efficient and effective approach designed for achieving k-anonymity) for enforcing both k-anonymity and l-diversity over large datasets in a distributed manner, leveraging the parallel computation of multiple workers. Our approach efficiently distributes the computation among the workers, without requiring visibility over the dataset in its entirety. Our data partitioning limits the need for workers to exchange data, so that each worker can independently anonymize a portion of the dataset. We implemented our approach providing parallel execution on a dynamically chosen number of workers. The experimental evaluation shows that our solution provides scalability, while not affecting the quality of the resulting anonymization.

Scalable Distributed Data Anonymization for Large Datasets / S. De Capitani di Vimercati, D. Facchinetti, S. Foresti, G. Livraga, G. Oldani, S. Paraboschi, M. Rossi, P. Samarati. - In: IEEE TRANSACTIONS ON BIG DATA. - ISSN 2332-7790. - 9:3(2023 Jun 01), pp. 818-831. [10.1109/TBDATA.2022.3207521]

Scalable Distributed Data Anonymization for Large Datasets

S. De Capitani di Vimercati
Primo
;
S. Foresti;G. Livraga;P. Samarati
Ultimo
2023

Abstract

k-Anonymity and l-diversity are two well-known privacy metrics that guarantee protection of the respondents of a dataset by obfuscating information that can disclose their identities and sensitive information. Existing solutions for enforcing them implicitly assume to operate in a centralized scenario, since they require complete visibility over the dataset to be anonymized, and can therefore have limited applicability in anonymizing large datasets. In this paper, we propose a solution that extends Mondrian (an efficient and effective approach designed for achieving k-anonymity) for enforcing both k-anonymity and l-diversity over large datasets in a distributed manner, leveraging the parallel computation of multiple workers. Our approach efficiently distributes the computation among the workers, without requiring visibility over the dataset in its entirety. Our data partitioning limits the need for workers to exchange data, so that each worker can independently anonymize a portion of the dataset. We implemented our approach providing parallel execution on a dynamically chosen number of workers. The experimental evaluation shows that our solution provides scalability, while not affecting the quality of the resulting anonymization.
No
English
Distributed data anonymization; Mondrian; k-Anonymity; l-Diversity; Apache Spark
Settore INF/01 - Informatica
Articolo
Esperti anonimi
Pubblicazione scientifica
   Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and Owner control (MOSAICrOWN)
   MOSAICrOWN
   EUROPEAN COMMISSION
   H2020
   825333

   Green responsibLe privACy preservIng dAta operaTIONs
   MARSAL
   EUROPEAN COMMISSION

   High quality Open data Publishing and Enrichment (HOPE)
   HOPE
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2017MMJJRE_003

   Machine Learning-based, Networking and Computing Infrastructure Resource Management of 5G and beyond Intelligent Networks (MARSAL)
   GLACIATION
   EUROPEAN COMMISSION
   H2020
   101017171
1-giu-2023
mag-2023
Institute of Electrical and Electronics Engineers (IEEE)
9
3
818
831
14
Pubblicato
Periodico con rilevanza internazionale
manual
Aderisco
info:eu-repo/semantics/article
Scalable Distributed Data Anonymization for Large Datasets / S. De Capitani di Vimercati, D. Facchinetti, S. Foresti, G. Livraga, G. Oldani, S. Paraboschi, M. Rossi, P. Samarati. - In: IEEE TRANSACTIONS ON BIG DATA. - ISSN 2332-7790. - 9:3(2023 Jun 01), pp. 818-831. [10.1109/TBDATA.2022.3207521]
open
Prodotti della ricerca::01 - Articolo su periodico
8
262
Article (author)
Periodico con Impact Factor
S. De Capitani di Vimercati, D. Facchinetti, S. Foresti, G. Livraga, G. Oldani, S. Paraboschi, M. Rossi, P. Samarati
File in questo prodotto:
File Dimensione Formato  
Scalable_Distributed_Data_Anonymization_for_Large_Datasets.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.53 MB
Formato Adobe PDF
1.53 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/940404
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact