Single cell RNA sequencing marks a key methodological breakthrough for the characterization of cell types. Current pipelines employ micro fluidic devices or equivalent methods to collect gene expression profiles at single cell resolution; subsequently gene expression patterns are used as a proxy to define similar cell types and infer their identity. Dimensionality reduction and unsupervised clustering represent the de facto standard methods for the analysis of these data. These techniques however suffer from inherent limitations including: need for expert curated annotation of cell types; general lack of reproducibility; limited resolution in the identification and annotation of scarcely represented cell types. Here, we present SCALT (Single Cell Annotation Likelihood Tool), an innovative method which introduces a paradigm-shift for the analysis of single cell RNA sequencing data. In our approach, cells are annotated to a specific type at individual level, by using a simple but elegant method based on maximum likelihood, without the need for clustering, dimensionality reduction or manual annotation. SCALT leverages a collection of 471 lists of cell-type specific genes, constructed by extensive re-analysis of comprehensive and expert curated catalogues (Human Protein Atlas and DISCO). Applied to the reference benchmark datasets by Abdelaal et. Al 2019, SCALT performed comparably or better than other methods therein tested. An extensive application of the tool on two publicly available databases, the Human Protein Atlas and DISCO, demonstrated that SCALT was able to properly re-classify the cells with the original annotation. Interestingly, it recognized the correct cell type for 98.7% and 98.8% of the over 553411 and 4339209 distinct cells included in the two datasets, respectively. In addition to that, the extensive re-analysis of the two datasets produced a collection of cell type specific lists of genes. These lists were equally-sized but also well separated in terms of proportion of genes shared by each couple of lists, demonstrating the ability of the tool to generate distinct cell type defining lists in a deterministic fashion. In conclusion, SCALT introduces an innovative method for single cell RNA sequencing data analysis. Currently, it is able to classify 471 cell types. The method can generate cell type specific lists of genes in a deterministic manner without any human interpretation. SCALT is comparable to other methods for automatic cell type classification, as well as robust since it was tasted on over 5 million cells. Finally, the tool is multitasking: it can be used to annotate cells and generate cell type specific lists of genes from either an annotation or a collection of user-defined cell type specific lists of genes.
SCALT: automatic identification of cell types from single-cell RNA sequencing data / D. Traversa, M. Chiara. ((Intervento presentato al 9. convegno PhD Meeting Istituto Mario Negri tenutosi a Milano nel 2024.
SCALT: automatic identification of cell types from single-cell RNA sequencing data
D. Traversa
;M. Chiara
2024
Abstract
Single cell RNA sequencing marks a key methodological breakthrough for the characterization of cell types. Current pipelines employ micro fluidic devices or equivalent methods to collect gene expression profiles at single cell resolution; subsequently gene expression patterns are used as a proxy to define similar cell types and infer their identity. Dimensionality reduction and unsupervised clustering represent the de facto standard methods for the analysis of these data. These techniques however suffer from inherent limitations including: need for expert curated annotation of cell types; general lack of reproducibility; limited resolution in the identification and annotation of scarcely represented cell types. Here, we present SCALT (Single Cell Annotation Likelihood Tool), an innovative method which introduces a paradigm-shift for the analysis of single cell RNA sequencing data. In our approach, cells are annotated to a specific type at individual level, by using a simple but elegant method based on maximum likelihood, without the need for clustering, dimensionality reduction or manual annotation. SCALT leverages a collection of 471 lists of cell-type specific genes, constructed by extensive re-analysis of comprehensive and expert curated catalogues (Human Protein Atlas and DISCO). Applied to the reference benchmark datasets by Abdelaal et. Al 2019, SCALT performed comparably or better than other methods therein tested. An extensive application of the tool on two publicly available databases, the Human Protein Atlas and DISCO, demonstrated that SCALT was able to properly re-classify the cells with the original annotation. Interestingly, it recognized the correct cell type for 98.7% and 98.8% of the over 553411 and 4339209 distinct cells included in the two datasets, respectively. In addition to that, the extensive re-analysis of the two datasets produced a collection of cell type specific lists of genes. These lists were equally-sized but also well separated in terms of proportion of genes shared by each couple of lists, demonstrating the ability of the tool to generate distinct cell type defining lists in a deterministic fashion. In conclusion, SCALT introduces an innovative method for single cell RNA sequencing data analysis. Currently, it is able to classify 471 cell types. The method can generate cell type specific lists of genes in a deterministic manner without any human interpretation. SCALT is comparable to other methods for automatic cell type classification, as well as robust since it was tasted on over 5 million cells. Finally, the tool is multitasking: it can be used to annotate cells and generate cell type specific lists of genes from either an annotation or a collection of user-defined cell type specific lists of genes.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.