Motivation As nucleic acid sequencing technologies become increasingly accessible, their applications expand beyond research into various domains, including healthcare. Personalized medicine and pharmacogenomics hold the promise of revolutionizing medical treatments for a wide range of pathologies, such as cancer and genetic diseases. However, to fully exploit this potential numerous technical, legal, and ethical challenges should be addressed. The demand for efficient solutions in secure handling of human genetic data is very high and requires the development of ready-to-use and cost-effective services which can be efficiently provided by by public infrastructures such as ELIXIR-IT, the Italian node of the European Research Infrastructure for Life Science Data. Here, we describe the architecture of a VM-based service integrated into a broader computational environment designed for managing human genetic data from production to deposition in access-controlled repositories, to be based in the ReCaS datacenter in Bari, Italy. Methods Data is transferred via SSH protocol to a secure storage facility, we named BioRepository, providing data-at-rest encryption and geo-redundant storage. A virtualized computational environment is deployed through a cloud infrastructure, offering state-of-the-art bioinformatics tools for data analysis. Software tools are accessible through package managers and/or containerized for compatibility, reproducibility, and ease of updates. Workflow management systems streamline the analysis process. IT automation engines facilitate software installation, customization, and maintenance. Upon completion of the analysis, users gain access to downstream services for data FAIRification, deposition, and discoverability. These services will include a federated node of the EGA human genome-phenome archive (FEGA) for metadata browsing and data access control. Additionally, a service based on the Beacon protocol enables the discoverability of datasets hosted by the FEGA node. Results This integrated approach represents a significant leap forward in managing human genetic data infrastructure in Italy, providing a resource-efficient, easily maintainable, and scalable solution tailored for both research and healthcare applications. By combining secure data transfer mechanisms, state-of-the-art storage facilities, and a versatile computational environment, this system ensures the efficient handling of genetic data while upholding high standards of security and accessibility.
Development of a state of the art computational environment for handling human genetic data : the effort of ELIXIR-IT / C. Lo Giudice, F. Licciulli, G. Miniello, M. Moscatelli, S.N. Cox, A.S. Varvara, B. Fosso, M.A. Tangaro, R. Cilli, D. Traversa, G. Donvito, E. Capriotti, M. Chiara, F. Zambelli, G. Pesole. ((Intervento presentato al convegno BITS Annual Meeting tenutosi a Trento nel 2024.
Development of a state of the art computational environment for handling human genetic data : the effort of ELIXIR-IT
D. Traversa;M. Chiara;F. Zambelli;
2024
Abstract
Motivation As nucleic acid sequencing technologies become increasingly accessible, their applications expand beyond research into various domains, including healthcare. Personalized medicine and pharmacogenomics hold the promise of revolutionizing medical treatments for a wide range of pathologies, such as cancer and genetic diseases. However, to fully exploit this potential numerous technical, legal, and ethical challenges should be addressed. The demand for efficient solutions in secure handling of human genetic data is very high and requires the development of ready-to-use and cost-effective services which can be efficiently provided by by public infrastructures such as ELIXIR-IT, the Italian node of the European Research Infrastructure for Life Science Data. Here, we describe the architecture of a VM-based service integrated into a broader computational environment designed for managing human genetic data from production to deposition in access-controlled repositories, to be based in the ReCaS datacenter in Bari, Italy. Methods Data is transferred via SSH protocol to a secure storage facility, we named BioRepository, providing data-at-rest encryption and geo-redundant storage. A virtualized computational environment is deployed through a cloud infrastructure, offering state-of-the-art bioinformatics tools for data analysis. Software tools are accessible through package managers and/or containerized for compatibility, reproducibility, and ease of updates. Workflow management systems streamline the analysis process. IT automation engines facilitate software installation, customization, and maintenance. Upon completion of the analysis, users gain access to downstream services for data FAIRification, deposition, and discoverability. These services will include a federated node of the EGA human genome-phenome archive (FEGA) for metadata browsing and data access control. Additionally, a service based on the Beacon protocol enables the discoverability of datasets hosted by the FEGA node. Results This integrated approach represents a significant leap forward in managing human genetic data infrastructure in Italy, providing a resource-efficient, easily maintainable, and scalable solution tailored for both research and healthcare applications. By combining secure data transfer mechanisms, state-of-the-art storage facilities, and a versatile computational environment, this system ensures the efficient handling of genetic data while upholding high standards of security and accessibility.File | Dimensione | Formato | |
---|---|---|---|
claudio-lo_giudice-abstract-840.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
43.57 kB
Formato
Adobe PDF
|
43.57 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.