Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collabo- rative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrative control over a private Galaxy instance. These situations include, but are not limited to, worries about data privacy, the need for customization, the need to prioritise particular job types, the development of tools, and training activities. The Laniakea 1 software platform facilitates the provisioning of on-demand Galaxy instances over heterogeneous Cloud infrastructures, by leveraging on the open source INDIGO-DataCloud cloud stack [2], which aims to make cloud infrastructures more accessible by scientific communities. End users interact with Laniakea through a web front-end that allows a general setup of the Galaxy instance. The deployment of the virtual hardware and of the Galaxy software ecosystem is subse- quently performed by the INDIGO Platform as a Service layer. At the end of the process, the user gains access to a private, production-grade, fully customizable, Galaxy virtual instance. Laniakea features the deployment of stand-alone or cluster backed Galaxy instances, shared reference data volumes, and rapid development of novel Galaxy flavours for specific tasks. Moreover, to extend the usage of this platform in clinical scenarios, where the analysis of sensi- tive data, in compliance with the GDPR, requires strong countermeasures to grant data privacy and security, Laniakea guarantees the creation of isolated and secure environments, exploiting storage encryption and access control to Galaxy through VPN, in order to carry out data analysis. Laniakea allows the on-demand encryption of the entire storage volume attached to the virtual ma- chine, using the Linux kernel encryption module. The level of disk encryption is completely trans- parent to software applications, in this case Galaxy: data are encrypted and decrypted on-the-fly when writing and reading, respectively. The procedure has been completely automated through the web Dashboard of the PaaS orchestration service [3], taking advantage of Hashicorp Vault for stor- ing user passphrases. We have implemented a robust mechanism to create secure encryption keys and prevent user creden- tials or the encryption passphrase from being transmitted unencrypted to the virtual infrastructure, compromising its security. The oral contribution will provide details about the platform architecture and the service implemen- tation strategy. References 1 Tangaro at al. , Laniakea: an open solution to provide Galaxy “on-demand” instances over heteroge- neous cloud infrastructures, GigaScience, Volume 9, Issue 4, April 2020, giaa033, https://doi.org/10.1093/gigascience/gia [2] Salomoni, D., Campos, I., Gaido, L. et al. INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures. J Grid Computing 16, 381–408 (2018). https://doi.org/10.1007/s10723- 018-9453-3 [3] https://github.com/indigo-dc/orchestrator

On demand cloud-based secure environments for analysing personal and health data / F. Zambelli, G. Donvito, M. Tangaro, M. Antonacci, N. Foggetti. ((Intervento presentato al convegno CS3 2023 - Cloud Storage Synchronization and Sharing tenutosi a Barcellona nel 2023.

On demand cloud-based secure environments for analysing personal and health data

F. Zambelli
Primo
;
2023

Abstract

Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collabo- rative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrative control over a private Galaxy instance. These situations include, but are not limited to, worries about data privacy, the need for customization, the need to prioritise particular job types, the development of tools, and training activities. The Laniakea 1 software platform facilitates the provisioning of on-demand Galaxy instances over heterogeneous Cloud infrastructures, by leveraging on the open source INDIGO-DataCloud cloud stack [2], which aims to make cloud infrastructures more accessible by scientific communities. End users interact with Laniakea through a web front-end that allows a general setup of the Galaxy instance. The deployment of the virtual hardware and of the Galaxy software ecosystem is subse- quently performed by the INDIGO Platform as a Service layer. At the end of the process, the user gains access to a private, production-grade, fully customizable, Galaxy virtual instance. Laniakea features the deployment of stand-alone or cluster backed Galaxy instances, shared reference data volumes, and rapid development of novel Galaxy flavours for specific tasks. Moreover, to extend the usage of this platform in clinical scenarios, where the analysis of sensi- tive data, in compliance with the GDPR, requires strong countermeasures to grant data privacy and security, Laniakea guarantees the creation of isolated and secure environments, exploiting storage encryption and access control to Galaxy through VPN, in order to carry out data analysis. Laniakea allows the on-demand encryption of the entire storage volume attached to the virtual ma- chine, using the Linux kernel encryption module. The level of disk encryption is completely trans- parent to software applications, in this case Galaxy: data are encrypted and decrypted on-the-fly when writing and reading, respectively. The procedure has been completely automated through the web Dashboard of the PaaS orchestration service [3], taking advantage of Hashicorp Vault for stor- ing user passphrases. We have implemented a robust mechanism to create secure encryption keys and prevent user creden- tials or the encryption passphrase from being transmitted unencrypted to the virtual infrastructure, compromising its security. The oral contribution will provide details about the platform architecture and the service implemen- tation strategy. References 1 Tangaro at al. , Laniakea: an open solution to provide Galaxy “on-demand” instances over heteroge- neous cloud infrastructures, GigaScience, Volume 9, Issue 4, April 2020, giaa033, https://doi.org/10.1093/gigascience/gia [2] Salomoni, D., Campos, I., Gaido, L. et al. INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures. J Grid Computing 16, 381–408 (2018). https://doi.org/10.1007/s10723- 018-9453-3 [3] https://github.com/indigo-dc/orchestrator
6-mar-2023
Settore BIO/11 - Biologia Molecolare
Settore INF/01 - Informatica
https://indico.cern.ch/event/1210538/book-of-abstracts.pdf
On demand cloud-based secure environments for analysing personal and health data / F. Zambelli, G. Donvito, M. Tangaro, M. Antonacci, N. Foggetti. ((Intervento presentato al convegno CS3 2023 - Cloud Storage Synchronization and Sharing tenutosi a Barcellona nel 2023.
Conference Object
File in questo prodotto:
File Dimensione Formato  
book-of-abstracts.pdf

accesso aperto

Descrizione: Book of Abstracts
Tipologia: Publisher's version/PDF
Dimensione 144.58 kB
Formato Adobe PDF
144.58 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/983008
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact