Human Cytochrome P450 enzymes (CYP450s) are respon- sible for metabolizing 70–80% of clinically used drugs. The develop- ment of computational tools to accurately predict CYP450 enzyme- substrate interactions is crucial for drug discovery and chemical tox- icology studies. In this work, we introduce CypEGAT, a deep learn- ing framework designed to enhance prediction performance by integrat- ing protein embeddings of CYP450s (extracted using the pre-trained ESM-2 Transformer model) with molecular embeddings generated by our fine-tuned Graph Attention Network (GAT). The CypEGAT model was trained end-to-end on two large-scale experimental enzyme-substrate datasets and our CYP450s dataset, which comprises 51,753 CYP450 enzyme-substrate pairs and 27,857 enzyme-nonsubstrate pairs. Focusing on five major human CYP450 isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), CypEGAT achieves an overall predictive accu- racy of 0.882 and an AUROC of 0.928. The model demonstrates robust generalizability to novel chemical compounds across different CYP450 isoforms, underscoring its potential as a powerful tool for drug metabolism studies.
CypEGAT: A Deep Learning Framework Integrating Protein Language Model and Graph Attention Networks for Enhanced CYP450s Substrate Prediction / Y. Wei, U. Guerrini, I. Eberini (COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE). - In: AI for Research and Scalable, Efficient Systems / [a cura di] Q. Wang, W. Yin, A. Aich, Y. Suh, K.-C. Peng. - [s.l] : Springer Singapore, 2025 Jun 30. - ISBN 978-981-96-8911-8. - pp. 161-172 (( convegno Second International Workshop, AI4Research 2025, and First International Workshop, SEAS 2025 tenutosi a Philadelphia nel 2025 [10.1007/978-981-96-8912-5_7].
CypEGAT: A Deep Learning Framework Integrating Protein Language Model and Graph Attention Networks for Enhanced CYP450s Substrate Prediction
Y. Wei;U. Guerrini;I. Eberini
2025
Abstract
Human Cytochrome P450 enzymes (CYP450s) are respon- sible for metabolizing 70–80% of clinically used drugs. The develop- ment of computational tools to accurately predict CYP450 enzyme- substrate interactions is crucial for drug discovery and chemical tox- icology studies. In this work, we introduce CypEGAT, a deep learn- ing framework designed to enhance prediction performance by integrat- ing protein embeddings of CYP450s (extracted using the pre-trained ESM-2 Transformer model) with molecular embeddings generated by our fine-tuned Graph Attention Network (GAT). The CypEGAT model was trained end-to-end on two large-scale experimental enzyme-substrate datasets and our CYP450s dataset, which comprises 51,753 CYP450 enzyme-substrate pairs and 27,857 enzyme-nonsubstrate pairs. Focusing on five major human CYP450 isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), CypEGAT achieves an overall predictive accu- racy of 0.882 and an AUROC of 0.928. The model demonstrates robust generalizability to novel chemical compounds across different CYP450 isoforms, underscoring its potential as a powerful tool for drug metabolism studies.| File | Dimensione | Formato | |
|---|---|---|---|
|
657888_1_En_7_Chapter_Author.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza:
Nessuna licenza
Dimensione
1.39 MB
Formato
Adobe PDF
|
1.39 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
978-981-96-8912-5_7.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Licenza:
Nessuna licenza
Dimensione
998.11 kB
Formato
Adobe PDF
|
998.11 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




