An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~ (nmω-1) + O(N) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~ (· ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+ kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k= 1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+ N) log m) or O(nm3+ N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm) -time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.

Elastic-Degenerate String Matching with 1 Error / G. Bernardini, E. Gabory, S.P. Pissis, L. Stougie, M. Sweering, W. Zuba (LECTURE NOTES IN COMPUTER SCIENCE). - In: LATIN 2022: Theoretical Informatics / [a cura di] A. Castañeda, F. Rodríguez-Henríquez. - [s.l] : Springer Science and Business Media Deutschland GmbH, 2022. - ISBN 9783031206238. - pp. 20-37 (( Intervento presentato al 15. convegno Latin American Symposium on Theoretical Informatics tenutosi a Guanajuato nel 2022 [10.1007/978-3-031-20624-5_2].

Elastic-Degenerate String Matching with 1 Error

G. Bernardini
Primo
;
2022

Abstract

An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~ (nmω-1) + O(N) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~ (· ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+ kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k= 1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+ N) log m) or O(nm3+ N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm) -time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.
Approximate string matching; Degenerate strings; Edit distance; Elastic-degenerate strings; String algorithms
Settore INFO-01/A - Informatica
2022
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
978-3-031-20624-5_2.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 527.25 kB
Formato Adobe PDF
527.25 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1131796
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact