The prediction of drugs metabolism by in silico techniques is gaining a growing interest due to the possibility to process large datasets allowing the stability and safety of new drug candidates to be evaluated during the early stages of the drug discovery process. To date, in silico models for metabolism prediction mainly exploits the ligand-based (LB) properties of the training molecules to predict the occurrence of a given metabolic reaction and/or the reactive site involved in the biotransformation. However, recent reports highlighted that structure-based (SB) modeling can be conveniently integrated with LB methods for drug metabolism prediction purpose, with the advantages to predict if a given molecule can fit the enzyme active site and which moiety approaches the catalytic residues. Herein, we developed machine learning models for UDP-glucuronosyltransferase (UGT)-mediated metabolism by using both LB and SB methods. In particular, this study was focused on UGT2B7 and UGT2B15 isoforms which are involved in the clearance of many drugs as well as in clinically relevant drug-drug interactions. First, molecular dynamics (MD) and docking simulations were combined to explore the binding mechanism of cofactor and substrate within the catalytic pocket of the studied UGT isoforms exploiting their AlphaFold structures. The analysis of the MD trajectories allowed an appropriate conformation of both UGT isoforms to be identified for the development of binary classification models. For this purpose, Random Forest algorithm and the metabolic data extracted from the MetaQSAR database were used. SB models were trained on a set of scoring functions and protein–ligand interaction fingerprints derived from docking, while the LB models were built on a set of physicochemical and constitutional descriptors. When the single models were evaluated, the LB classifiers outperformed the SB models. However, the application of a consensus strategy led to an improvement of the prediction accuracy if compared to the individual models, highlighting that LB and SB approaches convey complementary information whose aggregation allowed us to achieve better predictions than the single models.
Prediction of UGT-mediated phase II metabolism via ligand- and structure-based predictive models / L. Bono, F. Lunghini, E. Sabato, A.D. Biswas, A. Mazzolari, A. Pedretti, A.R. Beccari, G. Vistoli, S. Vittorio. - In: JOURNAL OF CHEMINFORMATICS. - ISSN 1758-2946. - 17:1(2025 Oct 15), pp. 158.1-158.15. [10.1186/s13321-025-01097-y]
Prediction of UGT-mediated phase II metabolism via ligand- and structure-based predictive models
L. BonoPrimo
;E. Sabato;A.D. Biswas;A. Mazzolari;A. Pedretti;G. VistoliPenultimo
;S. Vittorio
Ultimo
2025
Abstract
The prediction of drugs metabolism by in silico techniques is gaining a growing interest due to the possibility to process large datasets allowing the stability and safety of new drug candidates to be evaluated during the early stages of the drug discovery process. To date, in silico models for metabolism prediction mainly exploits the ligand-based (LB) properties of the training molecules to predict the occurrence of a given metabolic reaction and/or the reactive site involved in the biotransformation. However, recent reports highlighted that structure-based (SB) modeling can be conveniently integrated with LB methods for drug metabolism prediction purpose, with the advantages to predict if a given molecule can fit the enzyme active site and which moiety approaches the catalytic residues. Herein, we developed machine learning models for UDP-glucuronosyltransferase (UGT)-mediated metabolism by using both LB and SB methods. In particular, this study was focused on UGT2B7 and UGT2B15 isoforms which are involved in the clearance of many drugs as well as in clinically relevant drug-drug interactions. First, molecular dynamics (MD) and docking simulations were combined to explore the binding mechanism of cofactor and substrate within the catalytic pocket of the studied UGT isoforms exploiting their AlphaFold structures. The analysis of the MD trajectories allowed an appropriate conformation of both UGT isoforms to be identified for the development of binary classification models. For this purpose, Random Forest algorithm and the metabolic data extracted from the MetaQSAR database were used. SB models were trained on a set of scoring functions and protein–ligand interaction fingerprints derived from docking, while the LB models were built on a set of physicochemical and constitutional descriptors. When the single models were evaluated, the LB classifiers outperformed the SB models. However, the application of a consensus strategy led to an improvement of the prediction accuracy if compared to the individual models, highlighting that LB and SB approaches convey complementary information whose aggregation allowed us to achieve better predictions than the single models.| File | Dimensione | Formato | |
|---|---|---|---|
|
unpaywall-bitstream--649697634.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.6 MB
Formato
Adobe PDF
|
1.6 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




