The estimation of  relative site variability among aligned homologous protein sequences

Horner, D.; Pesole, G.

doi:10.1093/bioinformatics/btg063

Motivation: Maximum likelihood-based methods to estimate site by site substitution rate variability in aligned homologous protein sequences rely on the formulation of a phylogenetic tree and generally assume that the patterns of relative variability follow a pre-determined distribution. We present a phylogenetic tree-independent method to estimate the relative variability of individual sites within large datasets of homologous protein sequences. It is based upon two simple assumptions. Firstly that substitutions observed between two closely related sequences are likely, in general, to occur at the most variable sites. Secondly that non-conservative amino acid substitutions tend to occur at more variable sites. Our methodology makes no assumptions regarding the underlying pattern of relative variability between sites. Results: We have compared, using data simulated under a non-gamma distributed model, the performance of this approach to that of a maximum likelihood method that assumes gamma distributed rates. At low mean rates of evolution our method inferred site by site relative substitution rates more accurately than the maximum likelihood approach in the absence of prior assumptions about the relationships between sequences. Our method does not directly account for the effects of mutational saturation, However, we have incorporated an 'ad-hoc' modification that allows the accurate estimation of relative site variability in fast evolving and saturated datasets.

The estimation of relative site variability among aligned homologous protein sequences / D. Horner, G. Pesole. - In: BIOINFORMATICS. - ISSN 1367-4803. - 19:5(2003), pp. 600-606.

The estimation of relative site variability among aligned homologous protein sequences

D. Horner^Primo;G. Pesole^Ultimo

2003

Abstract

Motivation: Maximum likelihood-based methods to estimate site by site substitution rate variability in aligned homologous protein sequences rely on the formulation of a phylogenetic tree and generally assume that the patterns of relative variability follow a pre-determined distribution. We present a phylogenetic tree-independent method to estimate the relative variability of individual sites within large datasets of homologous protein sequences. It is based upon two simple assumptions. Firstly that substitutions observed between two closely related sequences are likely, in general, to occur at the most variable sites. Secondly that non-conservative amino acid substitutions tend to occur at more variable sites. Our methodology makes no assumptions regarding the underlying pattern of relative variability between sites. Results: We have compared, using data simulated under a non-gamma distributed model, the performance of this approach to that of a maximum likelihood method that assumes gamma distributed rates. At low mean rates of evolution our method inferred site by site relative substitution rates more accurately than the maximum likelihood approach in the absence of prior assumptions about the relationships between sequences. Our method does not directly account for the effects of mutational saturation, However, we have incorporated an 'ad-hoc' modification that allows the accurate estimation of relative site variability in fast evolving and saturated datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
			2003
		
	Rivista in ANCE
	
			BIOINFORMATICS
		
	DOI
	
			https://dx.doi.org/10.1093/bioinformatics/btg063
		
	Tipologia
	
			Article (author)
		
	Appare nelle tipologie:
	
			01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/25124

Citazioni

6

14

12

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

The estimation of relative site variability among aligned homologous protein sequences

D. Horner^Primo;G. Pesole^Ultimo

Primo

Ultimo

2003

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Pubblicazioni consigliate

Citazioni

social impact

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

The estimation of relative site variability among aligned homologous protein sequences

D. HornerPrimo;G. PesoleUltimo

Primo

Ultimo

2003

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

D. Horner^Primo;G. Pesole^Ultimo

Scheda breve

Scheda completa

Scheda completa (DC)