This archive contains codes and materials described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. Curated data empower deep learning for RNA epitranscriptome discovery `code_and_data.zip` contains code, trained models, training data, and transcriptome-wide predictions. These resources match the repository available at https://github.com/AnacletoLAB/RNA_m5C_predict, which also includes a user-friendly tool that processes FASTA files and outputs writer-specific m⁵C probabilities. `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` contain transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model. The tables have the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. Furthermore, the file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.
Software and Data for: Curated data empower deep learning for RNA epitranscriptome discovery / E. Saitto, E.C.. - (2025). [10.5281/zenodo.18036476]
Software and Data for: Curated data empower deep learning for RNA epitranscriptome discovery
E. Casiraghi;G. Valentini
2025
Abstract
This archive contains codes and materials described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. Curated data empower deep learning for RNA epitranscriptome discovery `code_and_data.zip` contains code, trained models, training data, and transcriptome-wide predictions. These resources match the repository available at https://github.com/AnacletoLAB/RNA_m5C_predict, which also includes a user-friendly tool that processes FASTA files and outputs writer-specific m⁵C probabilities. `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` contain transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model. The tables have the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. Furthermore, the file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




