Recent advances in the analysis of random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstacle for any practical application of these results is the time required to solve such linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper, we describe in detail a number of heuristics and programming techniques to speed up the solution of these systems by orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.024. For functions whose output has low entropy, we are able to implement feasibly for the first time the Hreinsson–Krøyer–Pagh approach, which makes it possible, for example, to store a function with an output of 10⁶ values distributed following a power law of exponent 2 in just 2.76 bits per key instead of 20.

Fast scalable construction of ([compressed] static | minimal perfect hash) functions / M. Genuzio, G. Ottaviano, S. Vigna. - In: INFORMATION AND COMPUTATION. - ISSN 0890-5401. - 273(2020), p. 104517. [Epub ahead of print] [10.1016/j.ic.2020.104517]

Fast scalable construction of ([compressed] static | minimal perfect hash) functions

M. Genuzio;S. Vigna
2020

Abstract

Recent advances in the analysis of random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstacle for any practical application of these results is the time required to solve such linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper, we describe in detail a number of heuristics and programming techniques to speed up the solution of these systems by orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.024. For functions whose output has low entropy, we are able to implement feasibly for the first time the Hreinsson–Krøyer–Pagh approach, which makes it possible, for example, to store a function with an output of 10⁶ values distributed following a power law of exponent 2 in just 2.76 bits per key instead of 20.
Minimal perfect hash functions; Static functions; Compressed functions; Succinct data structures
Settore INF/01 - Informatica
2020
15-gen-2020
Article (author)
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0890540120300043-main.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 587.9 kB
Formato Adobe PDF
587.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/773848
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact