Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA

Frith, M.C.; Bailey, T.L.; Kasukawa, T.; Mignone, F.; Kummerfeld, S.K.; Madera, M.; Sunkara, S.; Furuno, M.; Bult, C.J.; Quackenbush, J.; Kai, C.; Kawai, J.; Carninci, P.; Hayashizaki, Y.; Pesole, G.; Mattick, J.S.

doi:10.4161/rna.3.1.2789

Several recent studies indicate that mammals and other organisms produce large nos. of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different labs. have used different methods, whose ability to perform this discrimination is unclear. In this study, the authors examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous vs. non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, the analyses also provide evidence that as much as .apprx. 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.

Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA / Martin C. Frith, Timothy L. Bailey, Takeya Kasukawa, Flavio Mignone, Sarah K. Kummerfeld, Martin Madera, Sirisha Sunkara, Masaaki Furuno, Carol J. Bult, John Quackenbush, Chikatoshi Kai, Jun Kawai, Piero Carninci, Yoshihide Hayashizaki, Graziano Pesole, John S. Mattick. - In: RNA BIOLOGY. - ISSN 1547-6286. - 3:1(2006 Jan), pp. 40-48. [10.4161/rna.3.1.2789]

Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA

Martin C. Frith;Timothy L. Bailey;Takeya Kasukawa;F. Mignone;Sarah K. Kummerfeld;Martin Madera;Sirisha Sunkara;Masaaki Furuno;Carol J. Bult;John Quackenbush;Chikatoshi Kai;Jun Kawai;Piero Carninci;Yoshihide Hayashizaki;G. Pesole;John S. Mattick

2006

Abstract

Several recent studies indicate that mammals and other organisms produce large nos. of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different labs. have used different methods, whose ability to perform this discrimination is unclear. In this study, the authors examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous vs. non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, the analyses also provide evidence that as much as .apprx. 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Bioinformatics; mRNA; ncRNA; Proteome; Transcriptome
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				gen-2006
			
	Rivista in ANCE
	
				RNA BIOLOGY
			
	DOI
	
				https://dx.doi.org/10.4161/rna.3.1.2789
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/30205

Citazioni

58

101

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca