Computational protein design and large-scale assessment by I-TASSER structure assembly simulations

Bazzoli, A.; Tettamanzi, A.; Zhang, Y.

doi:10.1016/j.jmb.2011.02.017

Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of < 2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.

Computational protein design and large-scale assessment by I-TASSER structure assembly simulations / A. Bazzoli, A. Tettamanzi, Y. Zhang. - In: JOURNAL OF MOLECULAR BIOLOGY. - ISSN 0022-2836. - 407:5(2011 Apr), pp. 764-776. [10.1016/j.jmb.2011.02.017]

Computational protein design and large-scale assessment by I-TASSER structure assembly simulations

A. Bazzoli;A. Tettamanzi^Secondo;Y. Zhang

2011

Abstract

Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of < 2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			Monte Carlo minimization; protein design; protein structure prediction; sequence clustering
		
	Settori scientifico-disciplinari dell'articolo
	
			Settore INF/01 - Informatica
Settore BIO/11 - Biologia Molecolare
		
	Data di pubblicazione
	
			apr-2011
		
	Rivista in ANCE
	
			JOURNAL OF MOLECULAR BIOLOGY
		
	DOI
	
			https://dx.doi.org/10.1016/j.jmb.2011.02.017
		
	Tipologia
	
			Article (author)
		
	Appare nelle tipologie:
	
			01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/154198

Citazioni

17

32

30

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca