Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Kroyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR'ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: The size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.

Engineering compressed static functions / M. Genuzio, S. Vigna - In: 2018 Data Compression Conference[s.l] : IEEE, 2018. - ISBN 9781538648834. - pp. 52-61 (( convegno Data Compression Conference tenutosi a Snowbird nel 2018 [10.1109/DCC.2018.00013].

Engineering compressed static functions

M. Genuzio;S. Vigna
2018

Abstract

Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Kroyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR'ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: The size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.
Static functions; Succinct data structures; Computer Networks and Communications
Settore INF/01 - Informatica
2018
Brandeis University
Microsoft Research
University of Arizona
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
08416577.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 195.9 kB
Formato Adobe PDF
195.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/588526
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact