Motivation: Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results: We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes. Availability and implementation: DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE.

DeepTE : A computational method for de novo classification of transposons with convolutional neural network / H. Yan, A. Bombarely Gomez, S. Li. - In: BIOINFORMATICS. - ISSN 1367-4803. - 36:15(2020 Aug 01), pp. 4269-4275. [10.1093/bioinformatics/btaa519]

DeepTE : A computational method for de novo classification of transposons with convolutional neural network

A. Bombarely Gomez
Penultimo
;
2020

Abstract

Motivation: Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results: We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes. Availability and implementation: DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE.
Settore BIO/18 - Genetica
1-ago-2020
Article (author)
File in questo prodotto:
File Dimensione Formato  
2020.01.27.921874v1.full.pdf

accesso aperto

Descrizione: Preprint
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 512.13 kB
Formato Adobe PDF
512.13 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/798658
Citazioni
  • ???jsp.display-item.citation.pmc??? 31
  • Scopus 42
  • ???jsp.display-item.citation.isi??? 41
social impact