Transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. AI methods and biologically informed data curation enable accurate RNA m⁵C prediction. bioRxiv (September 2025). https://doi.org/10.1101/xxxxxxx `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` are tables with the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. The file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.
Predicted RNA m⁵C sites across the human transcriptome (GRCh38 GENCODE v45) / E. Saitto, E.C.. - (2025). [10.5281/zenodo.16629377]
Predicted RNA m⁵C sites across the human transcriptome (GRCh38 GENCODE v45)
E. Casiraghi;G. Valentini
2025
Abstract
Transcriptome‑wide predictions of RNA 5‑methyl‑cytosine (m⁵C) sites for the human reference transcriptome (GENCODE v45, GRCh38). Predictions were generated with the Bi-GRU model described in: Saitto, E., Casiraghi, E., Paccanaro, A. & Valentini, G. AI methods and biologically informed data curation enable accurate RNA m⁵C prediction. bioRxiv (September 2025). https://doi.org/10.1101/xxxxxxx `m5C_predictions.tsv.gz` and `m5C_predictions.xlsx` are tables with the following columns: Transcript‑level identifiers: `transcript_id`, `gene_id`, `gene_name`, `transcript_type`, `tags`. `position`: zero‑based coordinate of the cytosine within the transcript sequence. `Type`: predicted methyltransferase class – I (NSUN2), II (NSUN6), III (NSUN5), IV (NSUN1). `probability`: probability assigned by the model. `in_train_or_test_sets`: `TRUE` if the 51‑nt window centred on this cytosine was present in the training or test set; `FALSE` otherwise. The file `gene_enrichment.tar` contains enriched terms across different onthologies for genes with predicted m⁵C sites by each NSUN enzyme. In particular, we retained the highest-scoring transcriptome-wide m5C sites per methyltransferase, omitting any site used during training or testing—and queried g:Profiler against GO, KEGG, Reactome and the Human Phenotype Ontology.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




