Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In a previous work, it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called "zuffix array". The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.
Engineering Zuffix Arrays / P. Boldi, S. Marchini, S. Vigna (LEIBNIZ INTERNATIONAL PROCEEDINGS IN INFORMATICS). - In: 22nd International Symposium on Experimental Algorithms (SEA 2024) / [a cura di] L. Liberti. - [s.l] : Schloss Dagstuhl, 2024. - ISBN 978-3-95977-325-6. - pp. 1-18 (( Intervento presentato al 22. convegno Symposium on Experimental Algorithms (SEA) tenutosi a Wien nel 2024 [10.4230/LIPIcs.SEA.2024.2].
Engineering Zuffix Arrays
P. BoldiPrimo
;S. VignaUltimo
2024
Abstract
Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In a previous work, it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called "zuffix array". The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.File | Dimensione | Formato | |
---|---|---|---|
zarr.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
824.74 kB
Formato
Adobe PDF
|
824.74 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
LIPIcs.SEA.2024.2.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
942.4 kB
Formato
Adobe PDF
|
942.4 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.