Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In a previous work, it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called "zuffix array". The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.

Engineering Zuffix Arrays / P. Boldi, S. Marchini, S. Vigna (LEIBNIZ INTERNATIONAL PROCEEDINGS IN INFORMATICS). - In: 22nd International Symposium on Experimental Algorithms (SEA 2024) / [a cura di] L. Liberti. - [s.l] : Schloss Dagstuhl, 2024. - ISBN 978-3-95977-325-6. - pp. 1-18 (( Intervento presentato al 22. convegno Symposium on Experimental Algorithms (SEA) tenutosi a Wien nel 2024 [10.4230/LIPIcs.SEA.2024.2].

Engineering Zuffix Arrays

P. Boldi
Primo
;
S. Vigna
Ultimo
2024

Abstract

Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In a previous work, it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called "zuffix array". The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.
Suffix trees; suffix arrays; z-fast tries
Settore INF/01 - Informatica
   SEcurity and RIghts in the CyberSpace (SERICS)
   SERICS
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   codice identificativo PE00000014
2024
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.2
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
zarr.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 824.74 kB
Formato Adobe PDF
824.74 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
LIPIcs.SEA.2024.2.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 942.4 kB
Formato Adobe PDF
942.4 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1075008
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact