RegulonDB

Evidence Classification in RegulonDB


Scientific knowledge advances incrementally. At any point, we base broad conclusions on assertions of varying degrees of confidence. RegulonDB classifies evidence supporting particular assertions essentially based on the methods used to generate them. We do so to make explicit the complex mixture of more or less well supported specific claims that support broader conclusions (Weiss et al., 2013).

We classify the evidence supporting knowledge as ’Weak’, ’Strong ’ or ’Confirmed ’.

Weak evidence: Single evidence with more ambiguous conclusions, where alternative explanations, indirect effects, or potential false positives are prevalent, as well as computational predictions; for instance gel mobility shift assays with cell extracts or gene expression analysis.
Strong evidence: Single evidence with direct physical interaction or solid genetic evidence with a low probability for alternative explanations; for instance, footprinting with purified protein or site mutation.
Confirmed: is assigned, if objects are supported by at least two independent types of strong evidence with mutually excluding false positives. This approach is based essentially on the methods used to validate results and exclude alternative explanations in scientific research.

Confidence is assigned in two stages:

In stage I: we classify single evidence into weak or strong.
In stage II: we validate data by integrating multiple evidence in a process termed ‘Analytical Cross-Validation’. Cross-validation of weak evidence high throughput (HT) data to strong evidence and of strong evidence data to ‘confirmed’ is described in Stage II Analytical Cross-Validation.

Stage I. Classification of Individual Evidence Types
Description   
Single evidence is classified into weak or strong evidence (see above), depending on the confidence level of the associated methodologies.

1. Promoters and transcription start sites (TSSs)   
Promoters are defined in bacteria by the DNA region specifically bound by RNA polymerase to initiate transcription.
A TSS is the precise first nucleotide that is transcribed, different methods identify promoters or TSSs. They are jointly classified here.
Evidence Code Evidence Category
Strong Evidence
1.1 Assay of protein purified to homogeneity
Example: In vitro transcription assay.
APPH Classical experiment
1.2 Binding of purified proteins  
BPP Classical experiment
1.3 Footprinting  
FP Classical experiment
1.4 Inferred from direct assay  
IDA Classical experiment
1.5 Site mutation  
SM Classical experiment
1.6 In vitro transcription assay  
TA Classical experiment
1.7 Transcription initiation mapping
Example: Primer extension, S1 mapping, 5'RACE
TIM Classical experiment
1.8 RNA-seq using two enrichment strategies for primary transcripts and consistent biological replicates
Example: use of terminator exonuclease and differential ligation of adaptors.
RS-EPT-CBR High-throughput protocol
1.9 RNA-seq using two enrichment strategies for primary transcripts, consistent biological replicates, and evidence for a non-coding gene.  
RS-EPT-ENCG-CBR High-throughput protocol
1.10 cross validation(GEA/ROMA)  
CV(GEA/ROMA) independent cross-validation
1.11 cross validation(GEA/gSELEX)  
CV(GEA/gSELEX) independent cross-validation
1.12 High-throughput transcription initiation mapping  
HTTIM nd
Weak Evidence
1.13 Author statement  
AS Author statement
1.14 Non-traceable author statement
Example: An article that refers to a promoter by citing a paper, which cannot be traced.
NTAS Author statement
1.15 Traceable author statement
Example: An article that refers to a promoter by citing a paper that is traceable but the curation team has no accesses to.
TAS Author statement
1.16 Traceable author statement to experimental support
Example: A review article refers to a promoter citing a reference as the source for the experimental evidence. This code can be used when the original article is difficult to locate.
TASES Author statement
1.17 Assay of partially-purified protein  
APPP Classical experiment
1.18 Binding of cellular extracts  
BCE Classical experiment
1.19 Gene expression analysis  
GEA Classical experiment
1.20 Inferred from experiment  
IE Classical experiment
1.21 Inferred from expression pattern  
IEP Classical experiment
1.22 Inferred from genetic interaction  
IGI Classical experiment
1.23 Inferred from mutant phenotype
Example: Deletion of promoter regions.
IMP Classical experiment
1.24 Inferred from physical interaction  
IPI Classical experiment
1.25 Automated inference of promoter position
Example: Computational prediction.
AIPP Computational prediction or inference
1.26 Inferred by computational analysis  
ICA Computational prediction or inference
1.27 Inferred computationally without human oversight  
ICWHO Computational prediction or inference
1.28 ChIP analysis  
CHIP High-throughput protocol
1.29 ROMA  
ROMA High-throughput protocol
1.30 RNA-seq  
RS High-throughput protocol
1.31 Genomic SELEX  
gSELEX High-throughput protocol
1.32 Author hypothesis  
AH Human inference
1.33 Human inference of promoter position
Example: Identification of a possible promoter by an expert by reading the sequence.
HIPP Human inference
1.34 Inferred from Biological aspect from Ancestor  
IBAA Human inference
1.35 Inferred by curator  
IC Human inference
1.36 Inferred by a human based on computational evidence  
IHBCE Human inference
2. Regulatory interactions   
A regulatory interaction is defined, depending on the type of evidence, as the transcription factor (TF)-regulated gene interaction (TF-gene), or more specifically as the TF-DNA binding site interaction. Evidence Code Evidence Category
Strong Evidence
2.1 Assay of protein purified to homogeneity  
APPH Classical experiment
2.2 Binding of purified proteins  
BPP Classical experiment
2.3 Footprinting  
FP Classical experiment
2.4 Inferred from direct assay  
IDA Classical experiment
2.5 Site mutation
Example: Site-directed mutagenesis in the DNA binding site.
SM Classical experiment
2.6 In vitro transcription assay  
TA Classical experiment
2.7 ChIP analysis and statistical validation of TFBSs  
CHIP-SV High-throughput protocol
2.8 cross validation(GEA/ROMA)  
CV(GEA/ROMA) independent cross-validation
2.9 cross validation(GEA/gSELEX)  
CV(GEA/gSELEX) independent cross-validation
Weak Evidence
2.10 Author statement  
AS Author statement
2.11 Non-traceable author statement  
NTAS Author statement
2.12 Traceable author statement  
TAS Author statement
2.13 Traceable author statement to experimental support  
TASES Author statement
2.14 Assay of partially-purified protein  
APPP Classical experiment
2.15 Binding of cellular extracts
Example: Gel shift analysis.
BCE Classical experiment
2.16 Gene expression analysis
Example: Transcriptional fusions (lacZ).
GEA Classical experiment
2.17 Inferred from experiment  
IE Classical experiment
2.18 Inferred from expression pattern  
IEP Classical experiment
2.19 Inferred from genetic interaction
Example: In vitro titration assay.
IGI Classical experiment
2.20 Inferred from mutant phenotype
Example: A mutation of a transcription factor has a visible cell phenotype, and it is inferred that the regulator might be regulating the genes responsible for the phenotype.
IMP Classical experiment
2.21 Inferred from physical interaction  
IPI Classical experiment
2.22 Reaction blocked in mutant  
RBM Classical experiment
2.23 Automated inference based on similarity to consensus sequences
Example: computational method (e.g. PATSER) used to identify the binding site.
AIBSCS Computational prediction or inference
2.24 Inferred by computational analysis  
ICA Computational prediction or inference
2.25 Inferred computationally without human oversight  
ICWHO Computational prediction or inference
2.26 ChIP analysis
Example: ChIP-chip, ChIP-seq.
CHIP High-throughput protocol
2.27 Mapping of signal intensities
Example: RNA-seq or microarray analysis.
MSI High-throughput protocol
2.28 ROMA  
ROMA High-throughput protocol
2.29 Genomic SELEX  
gSELEX High-throughput protocol
2.30 Author hypothesis  
AH Human inference
2.31 Human inference based on similarity to consensus sequences
Example: putative binding site identified by an expert by reading the sequence.
HIBSCS Human inference
2.32 Inferred from Biological aspect from Ancestor  
IBAA Human inference
2.33 Inferred by curator  
IC Human inference
2.34 Inferred by a human based on computational evidence  
IHBCE Human inference
3. Transcription factor functional conformation    
Most dedicated TFs have usually two conformations, one with a non-covalent bound allosteric metabolite, or a covalent phosphorylation (holo conformation), and one as a free protein or multimer (the apo conformation). There are exceptions to this statement. We call functional conformation the one that is capable of binding to its specific binding sites and perform its activation or repression activity. For the sake of functional conformation evidence the experiments below have to be considered with and without effector. Evidence Code Evidence Category
Strong Evidence
3.1 Assay of protein purified to homogeneity  
APPH Classical experiment
3.2 Assay of protein purified to homogeneity from its native host  
APPHINH Classical experiment
3.3 Binding of purified proteins
Example: mobility shift assays, PAGE, filter binding assays
BPP Classical experiment
3.4 Inferred from direct assay
Example: Microscopy, sedimentation, ultracentrifugation (molecular weight determination of a protein complex), mmunoblotting experiments
IDA Classical experiment
3.5 Site mutation
Example: Expression analysis when putative regulator binding sites are mutated.
SM Classical experiment
3.6 Inferred by functional complementation  
IFC nd
Weak Evidence
3.7 Author statement  
AS Author statement
3.8 Non-traceable author statement  
NTAS Author statement
3.9 Traceable author statement  
TAS Author statement
3.10 Traceable author statement to experimental support  
TASES Author statement
3.11 Assay of partially-purified protein  
APPP Classical experiment
3.12 Assay of protein partially-purified from a heterologous host  
APPPHH Classical experiment
3.13 Assay of protein partially-purified from its native host  
APPPINH Classical experiment
3.14 Assay of unpurified protein  
AUP Classical experiment
3.15 Assay of unpurified protein expressed in its native host  
AUPEINH Classical experiment
3.16 Binding of cellular extracts
Example: Gel shift analysis.
BCE Classical experiment
3.17 Gene expression analysis
Example: Transcriptional fusions
GEA Classical experiment
3.18 Inferred from experiment  
IE Classical experiment
3.19 Inferred from expression pattern
Example: Northern blots, western blots, assay for enzyme activity in cell extracts
IEP Classical experiment
3.20 Inferred from genetic interaction  
IGI Classical experiment
3.21 Inferred from mutant phenotype
Example: Any gene mutation/knockout, overexpression/ectopic expression of wild-type genes or genes carrying mutations in the effector binding domain of the transcription factor.
IMP Classical experiment
3.22 Inferred from physical interaction
Example: Two-hybrid assays, co-immunoprecipitation, co-purification
IPI Classical experiment
3.23 Automated inference based on similarity to consensus sequences  
AIBSCS Computational prediction or inference
3.24 Automated inference of function from sequence
Example: Sequence similarity between effector domains of orthologous transcription factors.
AIFS Computational prediction or inference
3.25 Automated inference of function by sequence orthology  
AIFSO Computational prediction or inference
3.26 Inferred by computational analysis  
ICA Computational prediction or inference
3.27 Inferred computationally without human oversight  
ICWHO Computational prediction or inference
3.28 Author hypothesis  
AH Human inference
3.29 Human inference based on similarity to consensus sequences  
HIBSCS Human inference
3.30 Human inference of function from sequence  
HIFS Human inference
3.31 Inferred from Biological aspect from Ancestor  
IBAA Human inference
3.32 Inferred by curator  
IC Human inference
3.33 Inferred by a human based on computational evidence  
IHBCE Human inference
4. Transcription units
Evidence Code Evidence Category
Strong Evidence
4.1 Inferred from direct assay  
IDA Classical experiment
4.2 Length of transcript experimentally determined
Example: Northern blot.
LTED Classical experiment
4.3 Polar mutation
Example: A mutation of the promoter of the first gene affects the expression of neighboring genes.
PM Classical experiment
4.4 Mapping of signal intensities, evidence for a single gene, consistent biological replicates  
MSI-ESG-CBR High-throughput protocol
4.5 paired end di-tagging  
PET High-throughput protocol
Weak Evidence
4.6 Author statement  
AS Author statement
4.7 Non-traceable author statement  
NTAS Author statement
4.8 Traceable author statement  
TAS Author statement
4.9 Traceable author statement to experimental support  
TASES Author statement
4.10 Boundaries of transcription experimentally identified
Example: When promoter and terminator are identified.
BTEI Classical experiment
4.11 Inferred from experiment  
IE Classical experiment
4.12 Inferred from expression pattern  
IEP Classical experiment
4.13 Inferred from mutant phenotype  
IMP Classical experiment
4.14 Inferred through co-regulation
Example: 2+ adjacent genes show the same expression pattern across conditions.
ITCR Classical experiment
4.15 Products of adjacent genes in the same biological process  
PAGTSBP Classical experiment
4.16 Automated inference that a single-gene directon is a transcription unit
Example: A gene flanked by genes in opposite transcription directions.
AISGDTU Computational prediction or inference
4.17 Inferred by computational analysis  
ICA Computational prediction or inference
4.18 Inferred computationally without human oversight
Example: Computational results with no expert review.
ICWHO Computational prediction or inference
4.19 Mapping of signal intensities  
MSI High-throughput protocol
4.20 Author hypothesis  
AH Human inference
4.21 Inferred from Biological aspect from Ancestor  
IBAA Human inference
4.22 Inferred by curator  
IC Human inference
4.23 Inferred by a human based on computational evidence  
IHBCE Human inference


Stage II. Analytical Cross-Validation
Analytical cross-validation is an active evaluation of confidence and integrates multiple evidence by combining independent types of evidence, with the intention to confirm individual objects and mutually exclude false positives. It follows the same principles of science as applied by wet-lab scientists, where data are confirmed by repetitions on the one hand, and by additional experimental strategies to exclude alternative explanations on the other.

Analytical cross-validation requires, that the combined methods are independent, that is, do not share major sources of false positives or common raw materials. This approach allows to evaluate high throughput (HT) data. Objects, that are supported by two types of independent weak evidence are classified as strong evidence. Furthermore, it allows to introduce a third confidence score "confirmed". Objects, that are supported by two types of independent strong evidence are classified as confirmed evidence. The new confidence score confirmed describes the most reliable data that resemble the gold standard data in RegulonDB.

Description

For each object, the types of evidence are given, which can be combined with each other to allow an upgrade to confirmed confidence. Any two methods from different rows can be combined.
Types of evidence in the same row cannot be combined with each other. For instance, different protocols for transcription initiation mapping cannot be combined with each other for cross-validation, since these methods use mRNA as the starting material and therefore share a common source of false positives, which is RNA processing or degradation.

Cross-validation of TF binding sites and promoters requires that the exact location of the object is specified for each individual evidence.

Evidence codes: Each combination of two types of independent evidence is described as an evidence code, of the type CV(EC1/EC2). For instance, the evidence code for the combination of genomic SELEX (GSELEX) and gene expression analysis (GEA) is CV(GSELEX/GEA), that for the combination of footprinting (BPP) with site mutation (SM) is CV(BPP/SM).

1. Promoters and transcription start sites (TSSs)
Confirmed Evidence Objects supported by two types of independent strong evidence are classified as confirmed.
 
CV(FP/GEA/ROMA) FP: Footprinting
GEA: Gene expression analysis
ROMA: ROMA
CV(FP/GEA/gSELEX) FP: Footprinting
GEA: Gene expression analysis
gSELEX: Genomic SELEX
CV(FP/RS-EPT-CBR) FP: Footprinting
RS-EPT-CBR: RNA-seq using two enrichment strategies for primary transcripts and consistent biological replicates
CV(FP/RS-EPT-ENCG-CBR) FP: Footprinting
RS-EPT-ENCG-CBR: RNA-seq using two enrichment strategies for primary transcripts, consistent biological replicates, and evidence for a non-coding gene.
CV(FP/SM) FP: Footprinting
SM: Site mutation
CV(FP/TA) FP: Footprinting
TA: In vitro transcription assay
CV(FP/TIM) FP: Footprinting
TIM: Transcription initiation mapping
CV(GEA/ROMA/SM) GEA: Gene expression analysis
ROMA: ROMA
SM: Site mutation
CV(GEA/SM/gSELEX) GEA: Gene expression analysis
SM: Site mutation
gSELEX: Genomic SELEX
CV(RS-EPT-CBR/SM) RS-EPT-CBR: RNA-seq using two enrichment strategies for primary transcripts and consistent biological replicates
SM: Site mutation
CV(RS-EPT-CBR/TA) RS-EPT-CBR: RNA-seq using two enrichment strategies for primary transcripts and consistent biological replicates
TA: In vitro transcription assay
CV(RS-EPT-ENCG-CBR/SM) RS-EPT-ENCG-CBR: RNA-seq using two enrichment strategies for primary transcripts, consistent biological replicates, and evidence for a non-coding gene.
SM: Site mutation
CV(RS-EPT-ENCG-CBR/TA) RS-EPT-ENCG-CBR: RNA-seq using two enrichment strategies for primary transcripts, consistent biological replicates, and evidence for a non-coding gene.
TA: In vitro transcription assay
CV(SM/TA) SM: Site mutation
TA: In vitro transcription assay
CV(SM/TIM) SM: Site mutation
TIM: Transcription initiation mapping
CV(TA/TIM) TA: In vitro transcription assay
TIM: Transcription initiation mapping
2. Regulatory interactions
Strong Evidence Objects supported by two types of independent weak evidence are classified as strong.
 
CV(GEA/ROMA) GEA: Gene expression analysis
ROMA: ROMA
CV(GEA/gSELEX) GEA: Gene expression analysis
gSELEX: Genomic SELEX
Confirmed Evidence Objects supported by two types of independent strong evidence are classified as confirmed.
 
CV(CHIP-SV/FP) CHIP-SV: ChIP analysis and statistical validation of TFBSs
FP: Footprinting
CV(CHIP-SV/GEA/ROMA) CHIP-SV: ChIP analysis and statistical validation of TFBSs
GEA: Gene expression analysis
ROMA: ROMA
CV(CHIP-SV/GEA/gSELEX) CHIP-SV: ChIP analysis and statistical validation of TFBSs
GEA: Gene expression analysis
gSELEX: Genomic SELEX
CV(CHIP-SV/SM) CHIP-SV: ChIP analysis and statistical validation of TFBSs
SM: Site mutation
CV(FP/GEA/ROMA) FP: Footprinting
GEA: Gene expression analysis
ROMA: ROMA
CV(FP/GEA/gSELEX) FP: Footprinting
GEA: Gene expression analysis
gSELEX: Genomic SELEX
CV(FP/SM) FP: Footprinting
SM: Site mutation
CV(GEA/ROMA/SM) GEA: Gene expression analysis
ROMA: ROMA
SM: Site mutation
CV(GEA/SM/gSELEX) GEA: Gene expression analysis
SM: Site mutation
gSELEX: Genomic SELEX
3. Transcription units
Confirmed Evidence Objects supported by two types of independent strong evidence are classified as confirmed.
 
CV(LTED/PM) LTED: Length of transcript experimentally determined
PM: Polar mutation
CV(PET/PM) PET: paired end di-tagging
PM: Polar mutation


RegulonDB