We study the problem of making a de Bruijn graph (dBG), constructed from a collection of strings, weakly connected while minimizing the total cost of edge additions. The input graph is a dBG that can be made weakly connected by adding edges (along with extra nodes if needed) from the underlying complete dBG. The problem arises from genome reconstruction, where the dBG is constructed from a set of sequences generated from a genome sample by a sequencing experiment. Due to sequencing errors, the dBG is never Eulerian in practice and is often not even weakly connected. We show the following results for a dBG G(V, E) of order k consisting of d weakly connected components: 1. Making G weakly connected by adding a set of edges of minimal total cost is NP-hard. 2. No PTAS exists for making G weakly connected by adding a set of edges of minimal total cost (unless the unique games conjecture fails). We complement this result by showing that there does exist a polynomial-time (2 − 2/d)-approximation algorithm for the problem. 3. We consider a restricted version of the above problem, where we are asked to make G weakly connected by only adding directed paths between pairs of components. We show that making G weakly connected by adding d− 1 such paths of minimal total cost can be done in O(k|V |α(|V |) + |E|) time, where α(·) is the inverse Ackermann function. This improves on the O(k|V | log(|V |) + |E|)-time algorithm proposed by Bernardini et al. [CPM 2022] for the same restricted problem. 4. An ILP formulation of polynomial size for making G Eulerian with minimal total cost.

Connecting de Bruijn Graphs / G. Bernardini, H. Chen, I.L. Gortz, C. Krogh, G. Loukides, S.P. Pissis, L. Stougie, M. Sweering (LEIBNIZ INTERNATIONAL PROCEEDINGS IN INFORMATICS). - In: 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024) / [a cura di] S. Inenaga, S.J. Puglisi. - [s.l] : Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2024. - ISBN 9783959773263. - pp. 1-16 (( Intervento presentato al 35. convegno Annual Symposium on Combinatorial Pattern Matching, CPM 2024 tenutosi a Fukuoka nel 2024 [10.4230/LIPIcs.CPM.2024.6].

Connecting de Bruijn Graphs

G. Bernardini
Primo
;
2024

Abstract

We study the problem of making a de Bruijn graph (dBG), constructed from a collection of strings, weakly connected while minimizing the total cost of edge additions. The input graph is a dBG that can be made weakly connected by adding edges (along with extra nodes if needed) from the underlying complete dBG. The problem arises from genome reconstruction, where the dBG is constructed from a set of sequences generated from a genome sample by a sequencing experiment. Due to sequencing errors, the dBG is never Eulerian in practice and is often not even weakly connected. We show the following results for a dBG G(V, E) of order k consisting of d weakly connected components: 1. Making G weakly connected by adding a set of edges of minimal total cost is NP-hard. 2. No PTAS exists for making G weakly connected by adding a set of edges of minimal total cost (unless the unique games conjecture fails). We complement this result by showing that there does exist a polynomial-time (2 − 2/d)-approximation algorithm for the problem. 3. We consider a restricted version of the above problem, where we are asked to make G weakly connected by only adding directed paths between pairs of components. We show that making G weakly connected by adding d− 1 such paths of minimal total cost can be done in O(k|V |α(|V |) + |E|) time, where α(·) is the inverse Ackermann function. This improves on the O(k|V | log(|V |) + |E|)-time algorithm proposed by Bernardini et al. [CPM 2022] for the same restricted problem. 4. An ILP formulation of polynomial size for making G Eulerian with minimal total cost.
de Bruijn graph; Eulerian graph; graph algorithm; string algorithm
Settore INFO-01/A - Informatica
2024
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
LIPIcs.CPM.2024.6.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1131455
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact