Motivation: Insertions and deletions contribute significantly to genomic diversity both at intra and inter species levels. The recent advent of NGS methods has opened many opportunities for structural variant discovery, but also required the development of new computational methods. Several bioinformatics tools have been developed for the detection of indels using paired end reads (PE) NGS data. Methods: Existing methods can broadly be grouped into two categories, those that identify genomic clusters of pairs of reads showing atypical insert sizes to identify insertions and deletions with respect to a reference genome and those that consider the distribution of insert sizes for all read pairs covering a given genomic position. We present a variation on the latter approach which also includes information from reads where one member of the pair does not map to the reference genome (broken pairs) and uses machine learning approaches to differentiate between real indels and possible false positive predictions Results: We demonstrate that our approach significantly outperforms other available methods in terms of sensitivity, specificity and computational time/power requirements both in simulations and using publicly available human genome resequencing data. Our analyses demonstrate that use of data from \\\"broken pairs\\\" and careful integration of different statistics from mapping patterns can significantly improve the quality of indel predictions.

Accurate detection of genomic structural variations using high throughput resequencing data / M. Chiara, G. Pesole, H.S. Horner. ((Intervento presentato al 8. convegno BITS Annual Meeting tenutosi a Pisa nel 2011.

Accurate detection of genomic structural variations using high throughput resequencing data

M. Chiara
Primo
;
H.S. Horner
2011

Abstract

Motivation: Insertions and deletions contribute significantly to genomic diversity both at intra and inter species levels. The recent advent of NGS methods has opened many opportunities for structural variant discovery, but also required the development of new computational methods. Several bioinformatics tools have been developed for the detection of indels using paired end reads (PE) NGS data. Methods: Existing methods can broadly be grouped into two categories, those that identify genomic clusters of pairs of reads showing atypical insert sizes to identify insertions and deletions with respect to a reference genome and those that consider the distribution of insert sizes for all read pairs covering a given genomic position. We present a variation on the latter approach which also includes information from reads where one member of the pair does not map to the reference genome (broken pairs) and uses machine learning approaches to differentiate between real indels and possible false positive predictions Results: We demonstrate that our approach significantly outperforms other available methods in terms of sensitivity, specificity and computational time/power requirements both in simulations and using publicly available human genome resequencing data. Our analyses demonstrate that use of data from \\\"broken pairs\\\" and careful integration of different statistics from mapping patterns can significantly improve the quality of indel predictions.
21-giu-2011
Settore BIO/11 - Biologia Molecolare
Consiglio Nazionale delle Ricerche
Bioinformatics Italian Society
Accurate detection of genomic structural variations using high throughput resequencing data / M. Chiara, G. Pesole, H.S. Horner. ((Intervento presentato al 8. convegno BITS Annual Meeting tenutosi a Pisa nel 2011.
Conference Object
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/172422
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact