Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation! Pattern matching.
String Sanitization Under Edit Distance / G. Bernardini, H. Chen, G. Loukides, N. Pisanti, S.P. Pissis, L. Stougie, M. Sweering (LEIBNIZ INTERNATIONAL PROCEEDINGS IN INFORMATICS). - In: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)[s.l] : Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2020. - ISBN 9783959771498. - pp. 1-14 (( Intervento presentato al 31. convegno Annual Symposium on Combinatorial Pattern Matching, CPM 2020 tenutosi a Copenhagen nel 2020 [10.4230/lipics.cpm.2020.7].
String Sanitization Under Edit Distance
G. Bernardini
Primo
;
2020
Abstract
Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation! Pattern matching.| File | Dimensione | Formato | |
|---|---|---|---|
|
LIPIcs.CPM.2020.7.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
850.96 kB
Formato
Adobe PDF
|
850.96 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




