We introduce Finenzyme, a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning for the in silico modeling of enzymes. Our experiments show that Finenzyme significantly enhances generalist PLMs like ProGen for the in silico prediction and generation of enzymes belonging to specific Enzyme Commission (EC) categories. Our in silico experiments demonstrate that Finenzyme generated sequences can diverge from natural ones, while retaining similar predicted tertiary structure, predicted functions and the active sites of their natural counterparts. We show that embedded representations of the generated sequences obtained from the embeddings computed by both Finenzyme and ESMFold closely resemble those of natural ones, thus making them suitable for downstream tasks, including e.g. EC classification. Clustering analysis based on the primary and predicted tertiary structure of sequences reveals that the generated enzymes form clusters that largely overlap with those of natural enzymes. These overall in silico validation experiments indicate that Finenzyme effectively captures the structural and functional properties of target enzymes, and can in perspective support targeted enzyme engineering tasks.
Fine-tuning of conditional Transformers improves in silico enzyme prediction and generation / M. Nicolini, E. Saitto, R.E. Jimenez Franco, E. Cavalleri, A.J. Galeano Alfonso, D. Malchiodi, A. Paccanaro, P.N. Robinson, E. Casiraghi, G. Valentini. - In: COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL. - ISSN 2001-0370. - 27:(2025), pp. 1318-1334. [10.1016/j.csbj.2025.03.037]
Fine-tuning of conditional Transformers improves in silico enzyme prediction and generation
M. NicoliniPrimo
;E. Cavalleri;D. Malchiodi;E. Casiraghi
Penultimo
;G. Valentini
Ultimo
2025
Abstract
We introduce Finenzyme, a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning for the in silico modeling of enzymes. Our experiments show that Finenzyme significantly enhances generalist PLMs like ProGen for the in silico prediction and generation of enzymes belonging to specific Enzyme Commission (EC) categories. Our in silico experiments demonstrate that Finenzyme generated sequences can diverge from natural ones, while retaining similar predicted tertiary structure, predicted functions and the active sites of their natural counterparts. We show that embedded representations of the generated sequences obtained from the embeddings computed by both Finenzyme and ESMFold closely resemble those of natural ones, thus making them suitable for downstream tasks, including e.g. EC classification. Clustering analysis based on the primary and predicted tertiary structure of sequences reveals that the generated enzymes form clusters that largely overlap with those of natural enzymes. These overall in silico validation experiments indicate that Finenzyme effectively captures the structural and functional properties of target enzymes, and can in perspective support targeted enzyme engineering tasks.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S2001037025001072-main.pdf
accesso aperto
Descrizione: Research Article
Tipologia:
Publisher's version/PDF
Dimensione
6.56 MB
Formato
Adobe PDF
|
6.56 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.