Background Next generation sequencing methods are widely adopted for a large amount of scientific purposes, from pure research to health-related studies. The decreasing costs per analysis led to big amounts of generated data and to the subsequent improvement of software for the respective analyses. As a consequence, many approaches have been developed to chain different software in order to obtain reliable and reproducible workflows. However, the large range of applications for NGS approaches entails the challenge to manage many different workflows without losing reliability. Methods We here present a high-throughput sequencing pipeline (HaTSPiL), a Python-powered CLI tool designed to handle different approaches for data analysis with a high level of reliability. The software relies on the barcoding of filenames using a human readable naming convention that contains any information regarding the sample needed by the software to automatically choose different workflows and parameters. HaTSPiL is highly modular and customisable, allowing the users to extend its features for any specific need. Conclusions HaTSPiL is licensed as Free Software under the MIT license and it is available at https://github.com/dodomorandi/hatspil.

HaTSPiL : A modular pipeline for high throughput sequencing data analysis / E. Morandi, M. Cereda, D. Incarnato, C. Parlato, G. Basile, F. Anselmi, A. Lauria, L.M. Simon, I.L. Polignano, F. Arruga, S. Deaglio, E. Tirtei, F. Fagioli, S. Oliviero. - In: PLOS ONE. - ISSN 1932-6203. - 14:10(2019 Oct 15), pp. e0222512.1-e0222512.9. [10.1371/journal.pone.0222512]

HaTSPiL : A modular pipeline for high throughput sequencing data analysis

M. Cereda
Secondo
;
2019

Abstract

Background Next generation sequencing methods are widely adopted for a large amount of scientific purposes, from pure research to health-related studies. The decreasing costs per analysis led to big amounts of generated data and to the subsequent improvement of software for the respective analyses. As a consequence, many approaches have been developed to chain different software in order to obtain reliable and reproducible workflows. However, the large range of applications for NGS approaches entails the challenge to manage many different workflows without losing reliability. Methods We here present a high-throughput sequencing pipeline (HaTSPiL), a Python-powered CLI tool designed to handle different approaches for data analysis with a high level of reliability. The software relies on the barcoding of filenames using a human readable naming convention that contains any information regarding the sample needed by the software to automatically choose different workflows and parameters. HaTSPiL is highly modular and customisable, allowing the users to extend its features for any specific need. Conclusions HaTSPiL is licensed as Free Software under the MIT license and it is available at https://github.com/dodomorandi/hatspil.
DNA; DNA Barcoding, Taxonomic; Data Analysis; Humans; Reproducibility of Results; Sequence Analysis, DNA; Workflow; High-Throughput Nucleotide Sequencing; Software
Settore BIO/11 - Biologia Molecolare
Settore MED/06 - Oncologia Medica
15-ott-2019
Article (author)
File in questo prodotto:
File Dimensione Formato  
hstspli.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/898569
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact