Elastic Degenerate (ED) strings and Elastic Founder (EF) graphs, here collectively named variable strings, are two representations of acyclic components of pangenomes which extend the well-known notion of indeterminate string. Recent studies have focused extensively on algorithmic tasks involving these structures and other forms of variable strings that they generalize. Among such tasks, the basic operation of matching a pattern into a text, a fundamental toolkit for pangenomic data analysis, deserves special attention. In this paper, (1) we establish a clear taxonomy across ED strings and EF graphs, categorizing types of variable strings from the simplest linear (solid) string to the most complex general cases; (2) we consider the problem match(X,Y) of matching a solid or variable pattern of type X into a variable text of type Y, and investigate its time complexity when X and Y are chosen from types of variable strings in the taxonomy of (1). For all possible X and Y, we either provide a non-trivial, often sub-quadratic, upper bound for match(X,Y), or we prove a quadratic conditional lower bound, taking as a reference the existing quadratic conditional lower bounds for match(solid,ED) and match(solid,EF). A preliminary version of this work appeared in [Ascone et al., WABI 2024].

Pattern matching with Elastic-Degenerate strings and Elastic-Founder graphs / R. Ascone, G.B.. - In: ALGORITHMS FOR MOLECULAR BIOLOGY. - ISSN 1748-7188. - 21:1(2026), pp. 8.1-8.25. [10.1186/s13015-025-00289-3]

Pattern matching with Elastic-Degenerate strings and Elastic-Founder graphs

G. Bernardini
;
2026

Abstract

Elastic Degenerate (ED) strings and Elastic Founder (EF) graphs, here collectively named variable strings, are two representations of acyclic components of pangenomes which extend the well-known notion of indeterminate string. Recent studies have focused extensively on algorithmic tasks involving these structures and other forms of variable strings that they generalize. Among such tasks, the basic operation of matching a pattern into a text, a fundamental toolkit for pangenomic data analysis, deserves special attention. In this paper, (1) we establish a clear taxonomy across ED strings and EF graphs, categorizing types of variable strings from the simplest linear (solid) string to the most complex general cases; (2) we consider the problem match(X,Y) of matching a solid or variable pattern of type X into a variable text of type Y, and investigate its time complexity when X and Y are chosen from types of variable strings in the taxonomy of (1). For all possible X and Y, we either provide a non-trivial, often sub-quadratic, upper bound for match(X,Y), or we prove a quadratic conditional lower bound, taking as a reference the existing quadratic conditional lower bounds for match(solid,ED) and match(solid,EF). A preliminary version of this work appeared in [Ascone et al., WABI 2024].
Degenerate string; Fine-grained complexity; Founder graph; Genome variant; Pangenomics; Pattern matching
Settore INFO-01/A - Informatica
   ALgorithms for PAngenome Computational Analysis
   ALPACA
   European Commission
   Horizon 2020 Framework Programme - European Training Networks
   956229

   Pan-genome Graph Algorithms and Data Integration
   PANGAIA
   European Commission
   Horizon 2020 Framework Programme - RISE
   872539
2026
Article (author)
File in questo prodotto:
File Dimensione Formato  
s13015-025-00289-3.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 3.67 MB
Formato Adobe PDF
3.67 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1254375
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 1
social impact