Fast and precise single-cell data analysis using a hierarchical autoencoder

Tran, D.; Nguyen, H.; Tran, B.; La Vecchia, C.; Luu, H.N.; Nguyen, T.

doi:10.1038/s41467-021-21312-2

A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.

Fast and precise single-cell data analysis using a hierarchical autoencoder / D. Tran, H. Nguyen, B. Tran, C. La Vecchia, H.N. Luu, T. Nguyen. - In: NATURE COMMUNICATIONS. - ISSN 2041-1723. - 12:1(2021 Feb 15), pp. 1029.1-1029.10. [10.1038/s41467-021-21312-2]

Fast and precise single-cell data analysis using a hierarchical autoencoder

D. Tran;H. Nguyen;B. Tran;C. La Vecchia;H. N. Luu;T. Nguyen

2021

Abstract

A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore MED/01 - Statistica Medica
			
	Data di pubblicazione
	
				15-feb-2021
			
	Rivista in ANCE
	
				NATURE COMMUNICATIONS
			
	DOI
	
				https://dx.doi.org/10.1038/s41467-021-21312-2
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
Fast and precise_Tran Nat acomm.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.51 MB Formato Adobe PDF Visualizza/Apri	1.51 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/815601

Citazioni

12

75

60

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca