Hide and Mine in Strings: Hardness and Algorithms

Bernardini, G.; Conte, A.; Gourdel, G.; Grossi, R.; Loukides, G.; Pisanti, N.; Pissis, S.P.; Punzi, G.; Stougie, L.; Sweering, M.

doi:10.1109/icdm50108.2020.00103

We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.

Hide and Mine in Strings: Hardness and Algorithms / G. Bernardini, A. Conte, G. Gourdel, R. Grossi, G. Loukides, N. Pisanti, S.P. Pissis, G. Punzi, L. Stougie, M. Sweering (PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON DATA MINING). - In: 2020 IEEE International Conference on Data Mining (ICDM)[s.l] : IEEE, 2020. - ISBN 978-1-7281-8316-9. - pp. 924-929 (( Intervento presentato al 20. convegno IEEE International Conference on Data Mining, ICDM 2020 tenutosi a Sorrento nel 2020 [10.1109/icdm50108.2020.00103].

Hide and Mine in Strings: Hardness and Algorithms

G. Bernardini^Primo;Conte, Alessio;Gourdel, Garance;Grossi, Roberto;Loukides, Grigorios;Pisanti, Nadia;Pissis, Solon P.;Punzi, Giulia;Stougie, Leen;Sweering, Michelle

2020

Abstract

We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Data privacy; Data sanitization; Frequent pattern mining; Knowledge hiding; String algorithms
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2020
			
	Enti collegati al convegno
	
				IEEE Computer Society
			
	DOI
	
				https://dx.doi.org/10.1109/icdm50108.2020.00103
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
Hide_and_Mine_in_Strings_Hardness_and_Algorithms.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 157.19 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	157.19 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1131861

Citazioni

ND

9

8

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca