RegulonDB RegulonDB 9.2: Downloadable Computional Prediction Datasets

Downloadable Computional Prediction Datasets

We kindly ask you to cite RegulonDB properly.
By downloading any of the files below, the user agrees to be bound to the terms and
conditions of the electronic

End User License Agreement


Description Method File
Promoter predictions

“We observed that real promoters occur mostly within regions with high densities of overlapping putative promoters. We evaluated several strategies to identify promoters. The best one uses an intrinsic score of the -10 and -35 hexamers that form the promoter as well as an extrinsic score that uses the distribution of promoters from the start of the gene. This high signal density is found mainly within regions upstream of genes, contrasting with coding regions and regions located between convergently transcribed genes.” A.M. Huerta, J. Collado-Vides, J Mol Biol. 333:261-78 (2003).

Sigma 24 Download
Sigma 28 Download
Sigma 32 Download
Sigma 38 Download
Sigma 54 Download
Sigma 70 Download
Operon predictions Operon prediction on (intergenic) distances Download

Operon predictions based on (intergenic) distances and Riley's functional classification.

We have previously demonstrated that genes within experimentally characterized operons of Escherichia coli are conserved together in other genomes more frequently than genes located at the borders of transcription units. We also show the relationship between our analyses of conservation and the inference of functional relationships from a genomic context

TF binding sites predictions

We have taken advantage of the phylogenetic proximity of Escherichia coli and other 16 organisms of this subdivision and the intensive search of the space sequence provided by a pattern-matching strategy. Using this approach, we complement predictions of regulatory sites made by using statistical models currently stored in Tractor_DB, and increase the number of transcriptional regulators with predicted binding sites up to 86.

The original prediction approach, based on the representation of binding sites through statistical models was complemented by a new approach that uses known E. coli regulatory sites as the basis for a pattern matching search of regulatory sites. The use of both approaches together resulted in a more intensive exploration of the sequence space of each regulator's binding site.



Computationally predicted transcription factor binding sites (TFBSs) using the evaluated weight matrix (see
). We scanned all upstream regions of every single gene, from +50 to -400 or from +50 to the closest upstream ORF, whatever happens first.  (see the methodology)


Transcription Factors Predictions

“Regulatory proteins in Escherichia coli with a helix-turn-helix (HTH) DNA binding motif show a position-function correlation such that repressors have this motif predominantly at the N terminus, whereas activators have the motif at the C-terminus extreme. Evidence is presented supporting a common history at the origin of this correlation. These results suggest that if shuffling of motifs occurred in Bacteria, it occurred only early in the history of these proteins, as opposed to what is observed in eukaryotic regulators.” Pérez-Rueda E, Collado-Vides J. J Mol Evol. 2001 Sep;53(3):172-9.

Riboswitches Prediction

For each group of orthologous proteins, the upstream regions of the first gene of each operon are taken and searched for motifs using MEME (Figure 1a). Each motif is then refined by several cycles of locating it among all upstream regions from all bacteria using MAST, and redefining a more specific motif with MEME (Figure 1b). Sequences with motifs can then be analyzed to see if they present evidence of conserved secondary structure (Figure 1c). Predicted motifs are also compared against the Rfam database to locate known structured elements and against RegulonDB to find known transcription factor binding sites.

Click here to see image.

Attenuators Prediction

For each predicted operon, the upstream region of the first gene is taken (Figure 1a). For every run of Us present in this region (Figure 1b), a stable structure in the adjacent region is searched for (Figure 1c). If a terminator is found, an anti-terminator is searched for, since it must be overlapping with the terminator (Figure 1d). An anti-antiterminator can be analogously located by finding a structure that overlaps with the anti-terminator (Figure 1e). For the particular case of translational attenuators, a terminator is searched for, since it overlaps with the Shine-Dalgarno site.

Click here to see image.