IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Machine learning has advanced the progress of protein design, also enabling more efficient and accurate modeling of protein-ligand interfaces. Due to the complexity of biological systems, selecting optimal candidates from the heterogeneous outputs of generative protein design tools remains a persistent challenge. In this work, we introduce a consensus ranking framework that integrates five state- of-the-art inverse folding models — ProteinMPNN, LigandMPNN, ESM-IF1, CARBonAra, and ProRefiner — applied to 25,716 curated protein-ligand complexes from the BioLip database. Our approach frames design selection as a supervised learning-to-rank problem and leverages a LightGBM-based LambdaMART model to fuse het- erogeneous scoring features into a unified ranking. We pointed out that consensus-ranked sequences outperform individual model selections in stability, binding affinity, and structural fidelity, as evaluated using Schrödinger and MOE free energy difference cal- culations. In a case study on three enzymes (NOV1, CYP153A, and LCD), our method consistently improves design quality, suggesting that consensus ranking can significantly enhance the success rate and efficiency of AI-driven protein engineering.

Benchmarking and Consensus Ranking of Inverse Folding Models for Protein-Ligand Interface Design / Y. Wei, U. Guerrini, I. Eberini - In: BCB Companion '25: Companion / [a cura di] M. Xinghua Shi, X. Qian. - [s.l] : ACM, 2025. - ISBN 979-8-4007-2222-6. - pp. 1-7 (( 16. International Conference on Bioinformatics, Computational Biology and Health Informatics Philadelphia 2025 [10.1145/3768322.3769031].

Benchmarking and Consensus Ranking of Inverse Folding Models for Protein-Ligand Interface Design

Y. Wei;U. Guerrini;I. Eberini

2025

Abstract

Machine learning has advanced the progress of protein design, also enabling more efficient and accurate modeling of protein-ligand interfaces. Due to the complexity of biological systems, selecting optimal candidates from the heterogeneous outputs of generative protein design tools remains a persistent challenge. In this work, we introduce a consensus ranking framework that integrates five state- of-the-art inverse folding models — ProteinMPNN, LigandMPNN, ESM-IF1, CARBonAra, and ProRefiner — applied to 25,716 curated protein-ligand complexes from the BioLip database. Our approach frames design selection as a supervised learning-to-rank problem and leverages a LightGBM-based LambdaMART model to fuse het- erogeneous scoring features into a unified ranking. We pointed out that consensus-ranked sequences outperform individual model selections in stability, binding affinity, and structural fidelity, as evaluated using Schrödinger and MOE free energy difference cal- culations. In a case study on three enzymes (NOV1, CYP153A, and LCD), our method consistently improves design quality, suggesting that consensus ranking can significantly enhance the success rate and efficiency of AI-driven protein engineering.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Machine Learning; Protein Design
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore BIOS-07/A - Biochimica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Metal-containing Radical Enzymes (MetRaZymes)
								
	Acronimo
	
									MetRaZymes
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	N. Contratto
	
									101073546
								
	Data di pubblicazione
	
				2025
			
	DOI
	
				https://dx.doi.org/10.1145/3768322.3769031
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
Benchmarking and Consensus Ranking of Inverse Folding Models for Protein-Ligand Interface Design.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 696.78 kB Formato Adobe PDF Visualizza/Apri	696.78 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1203255

Citazioni

ND

ND

ND

ND

social impact