An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g. pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists an O(nm2 + N) -time algorithm to solve this problem on-line after a pre-processing stage with time and space O(m). In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN) -time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for Hamming distance.

Pattern Matching on Elastic-Degenerate Text with Errors / G. Bernardini, N. Pisanti, S.P. Pissis, G. Rosone (LECTURE NOTES IN COMPUTER SCIENCE). - In: String Processing and Information Retrieval / [a cura di] G. Fici, M. Sciortino, R. Venturini. - [s.l] : Springer Verlag, 2017. - ISBN 9783319674278. - pp. 74-90 (( Intervento presentato al 24. convegno International Symposium on String Processing and Information Retrieval, SPIRE 2017 tenutosi a Palermo nel 2017 [10.1007/978-3-319-67428-5_7].

Pattern Matching on Elastic-Degenerate Text with Errors

G. Bernardini
Primo
;
2017

Abstract

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g. pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists an O(nm2 + N) -time algorithm to solve this problem on-line after a pre-processing stage with time and space O(m). In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN) -time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for Hamming distance.
Degenerate strings; Elastic-degenerate strings; Pan-genome; Pattern matching; Uncertain sequences
Settore INFO-01/A - Informatica
2017
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
978-3-319-67428-5_7.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 360.66 kB
Formato Adobe PDF
360.66 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1131923
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? 17
  • OpenAlex ND
social impact