It is well known that elements lying outside the coding regions of the human genome are involved in many human diseases. Therefore, the efforts to detect and characterize functional elements in the non-coding regions are rapidly increasing. Among many types of non-coding DNA, pseudogenes are sequences that share some similarities with their parental genes but have lost their ability to code for proteins. In this paper, we propose a methodology for detection and analysis of pseudogenes, based on transition probabilities of the nucleotides and their occurrences. The 1000 base pairs length downstream region of each detected pseudogene is analyzed in order to find a polyA tail and a polyadenylation signal. We implemented a Hidden Markov Model with the Viterbi algorithm to decode the upstream regions of the previously detected pseudogenes in order to search for CpG islands. In order to identify motif signals in the selected pseudogenes, we implemented the Gibbs sampling algorithm and we executed it on the flanking regions of some pseudogenes. Results demonstrate that the proposed methodology is an efficacious solution to detect new potential loci, especially when the query coverage of the alignment is shorter than the coding strand. These loci can be classed as pseudogene fragments.

Non-coding DNA: A methodology for detection and analysis of pseudogenes / G. Trucco, V. Cerioli - In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. 3) / [a cura di] R. Lorenz, A. Fred, H. Gamboa. - [s.l] : SciTePress, 2021. - ISBN 978-989-758-490-9. - pp. 93-100 (( convegno 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC) / 12th International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS) tenutosi a on line nel 2021 [10.5220/0010190400002865].

Non-coding DNA: A methodology for detection and analysis of pseudogenes

G. Trucco
Primo
;
2021

Abstract

It is well known that elements lying outside the coding regions of the human genome are involved in many human diseases. Therefore, the efforts to detect and characterize functional elements in the non-coding regions are rapidly increasing. Among many types of non-coding DNA, pseudogenes are sequences that share some similarities with their parental genes but have lost their ability to code for proteins. In this paper, we propose a methodology for detection and analysis of pseudogenes, based on transition probabilities of the nucleotides and their occurrences. The 1000 base pairs length downstream region of each detected pseudogene is analyzed in order to find a polyA tail and a polyadenylation signal. We implemented a Hidden Markov Model with the Viterbi algorithm to decode the upstream regions of the previously detected pseudogenes in order to search for CpG islands. In order to identify motif signals in the selected pseudogenes, we implemented the Gibbs sampling algorithm and we executed it on the flanking regions of some pseudogenes. Results demonstrate that the proposed methodology is an efficacious solution to detect new potential loci, especially when the query coverage of the alignment is shorter than the coding strand. These loci can be classed as pseudogene fragments.
Alignment; CpG Island; Gibbs Sampling; Pseudogenes; Viterbi Algorithm
Settore INF/01 - Informatica
2021
Institute for Systems and Technologies of Information, Control and Communication (INSTICC)
https://www.scitepress.org/Papers/2021/101904/101904.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
53-101904.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 308.57 kB
Formato Adobe PDF
308.57 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/928005
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact