TF Matrix Clustering

Clustering analysis on PWMs version 3.0 with RegulonDB data release version 8.8

Transcription factors belong to evolutionary related families, where two members of the same family tend to share a significant similarity of the protein domains that bind to DNA, which in bacteria are most frequently helix-turn-helix motifs.

The 93 PWMs available in RegulonDB, built as mentioned before, were analyzed with the program matrix-clustering,⁠, a tool that groups similar PWMs. Given the high similarity of motifs of proteins of the same family, this program can be used to identify Transcription Factor Binding Motifs (TFBMs) belonging to phylogenetically related TFs or DNA-binding proteins recognizing similar DNA sequences. .

The clustering displays as a collection of hierarchical trees (forest), where each tree represents a cluster with its global alignment of PWMs. Additionally, a heatmap representation with the all-versus-all PWMs is shown where the clusters can be observed.

Results Summary

Nb Input motifs Nb Clusters Found Linkage method Similarity metric Thresholds to partition the tree Download Link Clusters in TRANSFAC format
93 47 average Ncor Ncor = 0.4
cor = 0.5
w = 5
Additional Files

Exported files

File Description
Pairwise comparison This table shows the pairwise comparison between all the input motifs using different metrics. This is the compare-matrices result.
Matrix description This table shows information of each input motif.
Root motifs A file with the root matrix of each cluster.
Clusters A tab separated file containing the clusters and their correspondng motifs.
Internal nodes attributes This table shows the grouping steps of the hierarchical tree.
Tree of consensus A PDF file showing the alignment of the consensus. Each cluster is represented with a different color.