Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

Next generation sequencing of pooled samples : guideline for variants' filtering / S. Anand, E. Mangano, N. Barizzone, R. Bordoni, M. Sorosina, F. Clarelli, L. Corrado, F. Martinelli Boneschi, S. D'Alfonso, G. De Bellis. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 6:1(2016 Sep 27), pp. 33735.1-33735.9. [10.1038/srep33735]

Next generation sequencing of pooled samples : guideline for variants' filtering

E. Mangano;R. Bordoni;F. Martinelli Boneschi;
2016-09-27

Abstract

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.
Multidisciplinary
Settore MED/26 - Neurologia
SCIENTIFIC REPORTS
Article (author)
File in questo prodotto:
File Dimensione Formato  
Next Generation Sequencing of Pooled Samples- Guideline for Variants' Filtering..pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 645.54 kB
Formato Adobe PDF
645.54 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/519321
Citazioni
  • ???jsp.display-item.citation.pmc??? 34
  • Scopus 54
  • ???jsp.display-item.citation.isi??? 52
social impact