rCASC[1] is a workflow for scRNA-Seq data analysis providing an integrated analysis environment that exploits Docker containerization to achieve both functional and computational reproducibility of the data analysis process. rCASC modular architecture consists of 39 Docker images, each one tailored to perform a specific function, e.g., quality control, clustering, and feature selection. While this Docker-based implementation ensures a reliable framework for long-term reproducibility, rCASC is currently available only as a stand-alone software with a custom GUI, or as a command-line tool. To improve its availability and accessibility, a porting of rCASC to Galaxy is in progress to provide end-users with the possibility to automatically download, deploy, and run rCASC within the familiar Galaxy environment over the cloud. This operation is non-trivial due to the internal architecture of rCASC, composed of highly interconnected, Galaxy unfriendly, containerized modules. The porting requires, for example, several tweaks to rCASC’s input/output functions, as well as specific configurations for Galaxy and the Docker engine. We have therefore identified a set of rules for rCASC/Galaxy integration that could be easily expanded and applied to the more general task of porting or designing containerized tools for cloud-based Galaxy instances. Following these integration rules, the first rCASC workflow (Fig:1) has been successfully ported to Galaxy. We are now working on the migration of the others. The final aim is to make rCASC available also as a Galaxy flavour into the Laniakea [3] Galaxy on-demand system, providing a medium to deploy a Galaxy instance pre-loaded with rCASC tools without effort. Ultimately, this will result in a more user-friendly instrument for reproducibility-oriented scRNA-Seq data analysis, that can also be seamlessly supported by the cloud resources offered by any Laniakea-based service.

Porting the rCASC workflow for scRNA-Seq data analysis to Galaxy and the Laniakea Galaxy on-demand system / P. Mandreoli, L. Alessandrì, M.A. Tangaro, R. Calogero, F. Zambelli. ((Intervento presentato al convegno Bioinformatics Community Conference tenutosi a online nel 2020.

Porting the rCASC workflow for scRNA-Seq data analysis to Galaxy and the Laniakea Galaxy on-demand system

P. Mandreoli;F. Zambelli
2020

Abstract

rCASC[1] is a workflow for scRNA-Seq data analysis providing an integrated analysis environment that exploits Docker containerization to achieve both functional and computational reproducibility of the data analysis process. rCASC modular architecture consists of 39 Docker images, each one tailored to perform a specific function, e.g., quality control, clustering, and feature selection. While this Docker-based implementation ensures a reliable framework for long-term reproducibility, rCASC is currently available only as a stand-alone software with a custom GUI, or as a command-line tool. To improve its availability and accessibility, a porting of rCASC to Galaxy is in progress to provide end-users with the possibility to automatically download, deploy, and run rCASC within the familiar Galaxy environment over the cloud. This operation is non-trivial due to the internal architecture of rCASC, composed of highly interconnected, Galaxy unfriendly, containerized modules. The porting requires, for example, several tweaks to rCASC’s input/output functions, as well as specific configurations for Galaxy and the Docker engine. We have therefore identified a set of rules for rCASC/Galaxy integration that could be easily expanded and applied to the more general task of porting or designing containerized tools for cloud-based Galaxy instances. Following these integration rules, the first rCASC workflow (Fig:1) has been successfully ported to Galaxy. We are now working on the migration of the others. The final aim is to make rCASC available also as a Galaxy flavour into the Laniakea [3] Galaxy on-demand system, providing a medium to deploy a Galaxy instance pre-loaded with rCASC tools without effort. Ultimately, this will result in a more user-friendly instrument for reproducibility-oriented scRNA-Seq data analysis, that can also be seamlessly supported by the cloud resources offered by any Laniakea-based service.
19-lug-2020
Settore BIO/11 - Biologia Molecolare
Settore INF/01 - Informatica
https://bcc2020.sched.com/event/csuF/porting-the-rcasc-workflow-for-scrna-seq-data-analysis-to-galaxy-and-the-laniakea-galaxy-on-demand-system
Porting the rCASC workflow for scRNA-Seq data analysis to Galaxy and the Laniakea Galaxy on-demand system / P. Mandreoli, L. Alessandrì, M.A. Tangaro, R. Calogero, F. Zambelli. ((Intervento presentato al convegno Bioinformatics Community Conference tenutosi a online nel 2020.
Conference Object
File in questo prodotto:
File Dimensione Formato  
BCC2020_abstract_57.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 117.91 kB
Formato Adobe PDF
117.91 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/763382
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact