Human Cytochrome P450 enzymes (CYP450s) are respon- sible for metabolizing 70–80% of clinically used drugs. The develop- ment of computational tools to accurately predict CYP450 enzyme- substrate interactions is crucial for drug discovery and chemical tox- icology studies. In this work, we introduce CypEGAT, a deep learn- ing framework designed to enhance prediction performance by integrat- ing protein embeddings of CYP450s (extracted using the pre-trained ESM-2 Transformer model) with molecular embeddings generated by our fine-tuned Graph Attention Network (GAT). The CypEGAT model was trained end-to-end on two large-scale experimental enzyme-substrate datasets and our CYP450s dataset, which comprises 51,753 CYP450 enzyme-substrate pairs and 27,857 enzyme-nonsubstrate pairs. Focusing on five major human CYP450 isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), CypEGAT achieves an overall predictive accu- racy of 0.882 and an AUROC of 0.928. The model demonstrates robust generalizability to novel chemical compounds across different CYP450 isoforms, underscoring its potential as a powerful tool for drug metabolism studies.

CypEGAT: A Deep Learning Framework Integrating Protein Language Model and Graph Attention Networks for Enhanced CYP450s Substrate Prediction / Y. Wei, U. Guerrini, I. Eberini (COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE). - In: AI for Research and Scalable, Efficient Systems / [a cura di] Q. Wang, W. Yin, A. Aich, Y. Suh, K.-C. Peng. - [s.l] : Springer Singapore, 2025 Jun 30. - ISBN 978-981-96-8911-8. - pp. 161-172 (( convegno Second International Workshop, AI4Research 2025, and First International Workshop, SEAS 2025 tenutosi a Philadelphia nel 2025 [10.1007/978-981-96-8912-5_7].

CypEGAT: A Deep Learning Framework Integrating Protein Language Model and Graph Attention Networks for Enhanced CYP450s Substrate Prediction

Y. Wei;U. Guerrini;I. Eberini
2025

Abstract

Human Cytochrome P450 enzymes (CYP450s) are respon- sible for metabolizing 70–80% of clinically used drugs. The develop- ment of computational tools to accurately predict CYP450 enzyme- substrate interactions is crucial for drug discovery and chemical tox- icology studies. In this work, we introduce CypEGAT, a deep learn- ing framework designed to enhance prediction performance by integrat- ing protein embeddings of CYP450s (extracted using the pre-trained ESM-2 Transformer model) with molecular embeddings generated by our fine-tuned Graph Attention Network (GAT). The CypEGAT model was trained end-to-end on two large-scale experimental enzyme-substrate datasets and our CYP450s dataset, which comprises 51,753 CYP450 enzyme-substrate pairs and 27,857 enzyme-nonsubstrate pairs. Focusing on five major human CYP450 isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), CypEGAT achieves an overall predictive accu- racy of 0.882 and an AUROC of 0.928. The model demonstrates robust generalizability to novel chemical compounds across different CYP450 isoforms, underscoring its potential as a powerful tool for drug metabolism studies.
Enzyme-substrate prediction; Deep learning; Drug discovery
Settore BIOS-07/A - Biochimica
   Metal-containing Radical Enzymes (MetRaZymes)
   MetRaZymes
   EUROPEAN COMMISSION
   101073546
30-giu-2025
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
657888_1_En_7_Chapter_Author.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza: Nessuna licenza
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
978-981-96-8912-5_7.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 998.11 kB
Formato Adobe PDF
998.11 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1175175
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact