Over the past years, a number of metrics have been introduced to characterize the topology of complex networks. We use these methodologies to analyze networks obtained through Blast data mining. The algorithm we present consists of the following steps: 1- encode results of Blast searches as a distance matrix of e-values; 2- perform entropy-controlled clustering analysis to identify the communities; 3- statistical analysis of the resulting network, 4- gene ontology and data mining in sequence databases to infer the function of the identified clusters. We report on the analysis of two data sets; the first is formed by over 3300 plasmid encoded proteins and the second comprises over 4200 sequences related to nitrogen fixation proteins. In the first case we observed strong selective pressures for horizontal transfer and maintenance of genes encoding proteins for resistance to antibiotics, plasmid stability and conjugal transfer. Nitrogen fixation proteins can be divided on the basis of our results into three different groups: proteins with no paralogs in any of the genomes considered, proteins with paralogs belonging to different metabolic processes (O-paralogs) and proteins with paralogs in other and the same metabolic processes (I/O-paralogs).

Topological metrics in Blast data mining: Plasmid and nitrogen-fixing proteins case studies / P. Lió, M. Brilli, R. Fani (COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE). - In: Bioinformatics Research and Development / [a cura di] M. Elloumi, J. Küng, M. Linial, R.F. Murphy, K. Schneider, C. Toma. - [s.l] : Springer-Verlag, 2008. - ISBN 978-3-540-70598-7. - pp. 207-220 (( Intervento presentato al 2. convegno BIRD 2008 tenutosi a Wien nel 2008 [10.1007/978-3-540-70600-7_16].

Topological metrics in Blast data mining: Plasmid and nitrogen-fixing proteins case studies

M. Brilli;
2008

Abstract

Over the past years, a number of metrics have been introduced to characterize the topology of complex networks. We use these methodologies to analyze networks obtained through Blast data mining. The algorithm we present consists of the following steps: 1- encode results of Blast searches as a distance matrix of e-values; 2- perform entropy-controlled clustering analysis to identify the communities; 3- statistical analysis of the resulting network, 4- gene ontology and data mining in sequence databases to infer the function of the identified clusters. We report on the analysis of two data sets; the first is formed by over 3300 plasmid encoded proteins and the second comprises over 4200 sequences related to nitrogen fixation proteins. In the first case we observed strong selective pressures for horizontal transfer and maintenance of genes encoding proteins for resistance to antibiotics, plasmid stability and conjugal transfer. Nitrogen fixation proteins can be divided on the basis of our results into three different groups: proteins with no paralogs in any of the genomes considered, proteins with paralogs belonging to different metabolic processes (O-paralogs) and proteins with paralogs in other and the same metabolic processes (I/O-paralogs).
Settore BIO/19 - Microbiologia Generale
Settore ING-INF/06 - Bioingegneria Elettronica e Informatica
2008
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1048635
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact