Objective: In the last few years several complete genome sequences have been made available to the research community. The annotation of their complete inventory of protein coding genes, however, has been so far an elusive goal. Classical ab initio gene prediction methods have been of great support for this task, but show notable weakness in the prediction of genes with unusual structural features. On the other hand, annotation on the basis of similarity to already known genes in other species does not permit the detection of genuinely novel genes and also introduces a potential source of classification error when based on similarity to sequences erroneously annotated as protein coding. Finally, several methods for the functional classification and assessment of evolutionarily conserved regions have been proposed, but, to our knowledge, signal processing techniques have not been applied yet to this problem, despite their proven usefulness at the single genome level. Results: In this article we introduce the use of signal processing in comparative genomics and we propose a simple test able to evaluate the coding potential of a pairwise genomic sequence alignment according to the pattern and periodicity with which substitutions and gaps appear in the alignment. We assess the feasibility of our approach on an annotated set of human-mouse genomic alignments. Conclusion: Results show that the application of signal processing techniques to sequence alignments can be a useful tool for the identification of evolutionarily conserved protein-coding regions.
Detecting conserved coding genomic regions through signal processing of nucleotide substitution patterns / M. Ré, G. Pavesi. - In: ARTIFICIAL INTELLIGENCE IN MEDICINE. - ISSN 0933-3657. - 45:2-3(2009), pp. 117-123. [10.1016/j.artmed.2008.07.015]
Detecting conserved coding genomic regions through signal processing of nucleotide substitution patterns
M. RéPrimo
;G. Pavesi
2009
Abstract
Objective: In the last few years several complete genome sequences have been made available to the research community. The annotation of their complete inventory of protein coding genes, however, has been so far an elusive goal. Classical ab initio gene prediction methods have been of great support for this task, but show notable weakness in the prediction of genes with unusual structural features. On the other hand, annotation on the basis of similarity to already known genes in other species does not permit the detection of genuinely novel genes and also introduces a potential source of classification error when based on similarity to sequences erroneously annotated as protein coding. Finally, several methods for the functional classification and assessment of evolutionarily conserved regions have been proposed, but, to our knowledge, signal processing techniques have not been applied yet to this problem, despite their proven usefulness at the single genome level. Results: In this article we introduce the use of signal processing in comparative genomics and we propose a simple test able to evaluate the coding potential of a pairwise genomic sequence alignment according to the pattern and periodicity with which substitutions and gaps appear in the alignment. We assess the feasibility of our approach on an annotated set of human-mouse genomic alignments. Conclusion: Results show that the application of signal processing techniques to sequence alignments can be a useful tool for the identification of evolutionarily conserved protein-coding regions.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S093336570800105X-main.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
418.98 kB
Formato
Adobe PDF
|
418.98 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.