We investigate the potential of several artificial neural network architectures to be used as an index on a sorted set of strings, namely, as a mapping from a query string to (an estimate of) its lexicographic rank in the set, which allows solving some interesting string-search operations such as range and prefix searches. Our evaluation on a variety of real and synthetic datasets shows that learned models can beat the space vs error trade-off of the classic (possibly compressed) trie-based solutions for relatively dense datasets only, while being slower to be trained and queried. This leads us to conclude that learned models are not yet competitive with classic trie-based solutions, and thus cannot completely replace them, but possibly only integrate them. Although our study does not settle the question conclusively, it highlights appropriate methods, provides a baseline for comparison, and introduces several open problems, thereby serving as a starting point for future research.

On Nonlinear Learned String Indexing / P. Ferragina, M. Frasca, G. Cataldo Marinò, G. Vinciguerra. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023 Jul 14), pp. 74021-74034. [10.1109/ACCESS.2023.3295434]

On Nonlinear Learned String Indexing

M. Frasca
Secondo
;
2023

Abstract

We investigate the potential of several artificial neural network architectures to be used as an index on a sorted set of strings, namely, as a mapping from a query string to (an estimate of) its lexicographic rank in the set, which allows solving some interesting string-search operations such as range and prefix searches. Our evaluation on a variety of real and synthetic datasets shows that learned models can beat the space vs error trade-off of the classic (possibly compressed) trie-based solutions for relatively dense datasets only, while being slower to be trained and queried. This leads us to conclude that learned models are not yet competitive with classic trie-based solutions, and thus cannot completely replace them, but possibly only integrate them. Although our study does not settle the question conclusively, it highlights appropriate methods, provides a baseline for comparison, and introduces several open problems, thereby serving as a starting point for future research.
String dictionaries; string search; prefix search; tries; neural networks; machine learning; data structures; learned indexes;
Settore INF/01 - Informatica
   Multi-criteria optimized data structures: from compressed indexes to learned indexes, and beyond
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2017WR7SHH_004
14-lug-2023
https://ieeexplore.ieee.org/document/10184017
Article (author)
File in questo prodotto:
File Dimensione Formato  
On_Nonlinear_Learned_String_Indexing.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/999394
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact