Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications.

Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins / M. Nicolini, D. Malchiodi, A. Cabri, E. Cavalleri, M. Mesiti, A. Paccanaro, N. Robinson Peter, J. Reese, E. Casiraghi, G. Valentini - In: Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS / [a cura di] M. P. Guarino; K. Hotta; M. Yousef; H. Liu; G. Saggio; A. Fred; H. Gamboa. - [s.l] : SCITEPress, 2024. - ISBN 978-989-758-688-0. - pp. 561-568 (( Intervento presentato al 17. convegno International Joint Conference on Biomedical Engineering Systems and Technologies tenutosi a Roma nel 2024 [10.5220/0012567900003657].

Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins

D. Malchiodi;A. Cabri;E. Cavalleri;M. Mesiti;E. Casiraghi;G. Valentini
Ultimo
2024

Abstract

Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications.
Large Language Models; Protein Language Models; Conditional Transformers; Protein design and modeling
Settore INF/01 - Informatica
   National Center for Gene Therapy and Drugs based on RNA Technology
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   CN00000041
2024
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
125679.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1027709
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact