Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

Rè, M.; Pesole, G.; Horner, D.S.

doi:10.1186/1471-2105-10-282

Background: The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results: Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion: We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics / M. Rè, G. Pesole, D.S. Horner. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 10:282(2009), p. 1471.282. [10.1186/1471-2105-10-282]

Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

M. Rè^Primo;G. Pesole^Secondo;D.S. Horner^Ultimo

2009

Abstract

Background: The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results: Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion: We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore BIO/11 - Biologia Molecolare
			
	Data di pubblicazione
	
				2009
			
	Rivista in ANCE
	
				BMC BIOINFORMATICS
			
	DOI
	
				https://dx.doi.org/10.1186/1471-2105-10-282
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
1471-2105-10-282.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 551.86 kB Formato Adobe PDF Visualizza/Apri	551.86 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/141119

Citazioni

3

5

4

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca