Software and Data for: Curated data empower deep learning for RNA epitranscriptome discovery

Saitto, E.; Casiraghi, E.; Paccanaro, A.; Valentini, G.

doi:10.5281/zenodo.18036476

This archive contains codes and materials described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. Curated data empower deep learning for RNA epitranscriptome discovery `code_and_data.zip` contains code, trained models, training data, and transcriptome-wide predictions. These resources match the repository available at https://github.com/AnacletoLAB/RNA_m5C_predict, which also includes a user-friendly tool that processes FASTA files and outputs writer-specific m⁵C probabilities. `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` contain transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model. The tables have the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. Furthermore, the file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.

Software and Data for: Curated data empower deep learning for RNA epitranscriptome discovery / E. Saitto, E.C.. - (2025). [10.5281/zenodo.18036476]

Software and Data for: Curated data empower deep learning for RNA epitranscriptome discovery

Emanuele Saitto;E. Casiraghi;Alberto Paccanaro;G. Valentini

2025

Abstract

This archive contains codes and materials described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. Curated data empower deep learning for RNA epitranscriptome discovery `code_and_data.zip` contains code, trained models, training data, and transcriptome-wide predictions. These resources match the repository available at https://github.com/AnacletoLAB/RNA_m5C_predict, which also includes a user-friendly tool that processes FASTA files and outputs writer-specific m⁵C probabilities. `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` contain transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model. The tables have the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. Furthermore, the file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Settori scientifico-disciplinari del dataset (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	DOI
	
				https://dx.doi.org/10.5281/zenodo.18036476
			
	Appare nelle tipologie:
	
				22 - Dataset

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1255262

Citazioni

ND

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca