Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications.
Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins / M. Nicolini, D. Malchiodi, A. Cabri, E. Cavalleri, M. Mesiti, A. Paccanaro, N. Robinson Peter, J. Reese, E. Casiraghi, G. Valentini - In: Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS / [a cura di] M. P. Guarino; K. Hotta; M. Yousef; H. Liu; G. Saggio; A. Fred; H. Gamboa. - [s.l] : SCITEPress, 2024. - ISBN 978-989-758-688-0. - pp. 561-568 (( Intervento presentato al 17. convegno International Joint Conference on Biomedical Engineering Systems and Technologies tenutosi a Roma nel 2024 [10.5220/0012567900003657].
Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins
D. Malchiodi;A. Cabri;E. Cavalleri;M. Mesiti;E. Casiraghi;G. ValentiniUltimo
2024
Abstract
Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications.File | Dimensione | Formato | |
---|---|---|---|
125679.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.