We consider a typical situation, where a researcher has determined one or more clusters of genes co-expressed or co-regulated under some experimental condition. The next logical step should be the identification of the factors regulating the expression of the genes, but unfortunately the reliable prediction of binding sites for transcription factors (TFs) is a problem far from being solved. Here we present an algorithm that, given a set of promoters from co-regulated or co-expressed genes, identifies TFs whose binding sites are over-represented in the promoters by using profiles (frequency matrices) defining the DNA binding specificity of known TFs as well as matching statistics on a whole-genome level, bypassing the need of comparisons to homologous sequences. Results of tests we performed on experimentally validated sequence sets on different organisms (from yeast to human) are very promising, also when one or more TFs regulate only a subset of the input genes. Moreover,the algorithm is very fast, easy to use (nothing but the gene IDs are needed as input) and perhaps most important of all seems to be less false-positive prone than most of the methods currently available.

Finding transcription factors with over-represented binding sites in sequences from co-regulated or co-expressed genes / F. Zambelli, W. Breviario, G. Pesole, G. Pavesi. ((Intervento presentato al 9. convegno Congresso annuale FISV tenutosi a Riva del Garda nel 2007.

Finding transcription factors with over-represented binding sites in sequences from co-regulated or co-expressed genes

F. Zambelli
Primo
;
G. Pesole
Penultimo
;
G. Pavesi
Ultimo
2007

Abstract

We consider a typical situation, where a researcher has determined one or more clusters of genes co-expressed or co-regulated under some experimental condition. The next logical step should be the identification of the factors regulating the expression of the genes, but unfortunately the reliable prediction of binding sites for transcription factors (TFs) is a problem far from being solved. Here we present an algorithm that, given a set of promoters from co-regulated or co-expressed genes, identifies TFs whose binding sites are over-represented in the promoters by using profiles (frequency matrices) defining the DNA binding specificity of known TFs as well as matching statistics on a whole-genome level, bypassing the need of comparisons to homologous sequences. Results of tests we performed on experimentally validated sequence sets on different organisms (from yeast to human) are very promising, also when one or more TFs regulate only a subset of the input genes. Moreover,the algorithm is very fast, easy to use (nothing but the gene IDs are needed as input) and perhaps most important of all seems to be less false-positive prone than most of the methods currently available.
2007
Settore INF/01 - Informatica
Finding transcription factors with over-represented binding sites in sequences from co-regulated or co-expressed genes / F. Zambelli, W. Breviario, G. Pesole, G. Pavesi. ((Intervento presentato al 9. convegno Congresso annuale FISV tenutosi a Riva del Garda nel 2007.
Conference Object
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/62899
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact