We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(nω log d) time, where ω is the matrix multiplication exponent. We show that, despite the nω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

Reverse-Safe Data Structures for Text Indexing / G. Bernardini, H. Chen, G. Fici, G. Loukides, S.P. Pissis (PROCEEDINGS OF THE ... WORKSHOP ON ALGORITHM ENGINEERING AND EXPERIMENTS). - In: 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX) / [a cura di] G. Blelloch, I. Finocchi. - [s.l] : Society for Industrial and Applied Mathematics Publications, 2020. - ISBN 9781611976007. - pp. 199-213 (( convegno Symposium on Algorithm Engineering and Experiments, ALENEX 2020 tenutosi a Salt Lake City nel 2020 [10.1137/1.9781611976007.16].

Reverse-Safe Data Structures for Text Indexing

G. Bernardini
Primo
;
2020

Abstract

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(nω log d) time, where ω is the matrix multiplication exponent. We show that, despite the nω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
Data structure; data privacy; string algorithm
Settore INFO-01/A - Informatica
2020
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
1.9781611976007.16.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1131900
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 9
  • OpenAlex ND
social impact