Pre-registered and confirmed: structural trust certification of the p53 network across two independent databases
We pre-registered 5 structural hypotheses about the p53 protein interaction network before running any analysis - then certified agreement between a curated BioGRID/DepMap dataset and STRING v12.0. All 5 confirmed. TP53, MDM2, ATM, BRCA1, and CHK2 are structurally robust across independent sources. CHEK1 and PCNA are boundary cases - hub in STRING only.
Background
The Functional Proximity Law holds across 16 independent domains: hub rank is correlated across layers when both layers measure the same underlying system. The next question is: does hub rank agree across independent sources that measure the same system by different methods?
This is the question Dataset Trust Certification answers.
Setup
- We compared two protein interaction datasets for the p53 DNA-damage response network:
- Source A - Curated v1.0 (BioGRID/DepMap): expert-curated physical interactions, co-expression, functional associations, and genetic interactions for 15 proteins
- Source B - STRING v12.0: automated channel scores (physical, co-expression, functional, genetic proxy) for the same 15 proteins
Both datasets contain the same 15 nodes. The question is whether the same nodes rank as hubs in both sources.
Pre-registration
Before running any analysis, we committed 5 hypotheses to a public pre-registration repository with a SHA-256 hash of the prediction file:
hash: 5bbbd9cbebe29544e0f239708e384c85aa6f7f4d2a2eb39907b79c80a1d4b023
timestamp: 2026-05-14T17:47:08.403592+00:00
-
The 5 hypotheses:
- h1 - TP53 will appear in the trusted node set (hub in both sources)
- h2 - MDM2 will appear in the trusted node set
- h3 - The overall verdict will be PARTIAL (agreement score 0.4-0.7)
- h4 - The Functional Proximity Law will hold in both sources independently, with r(functional_association, physical_interaction) as the top-ranked layer pair in both
- h5 - PCNA will appear in the boundary node set (hub in STRING only, not in the curated set)
The reasoning for h5: STRING's automated channel scoring captures co-complex membership densely (PCNA is a sliding clamp binding many repair proteins), while the curated set focused on the core p53-MDM2-ATM signaling axis.
Results
Running irdme certify on both graphs:
Nodes A: 15 Nodes B: 15 Shared: 15
Top-7 hub Jaccard: 0.5556 Verdict: PARTIAL
TRUSTED NODES (5): atm ATM (DSB sensor kinase) brca1 BRCA1 (DNA repair scaffold) chek2 CHK2 (G1/S checkpoint kinase) mdm2 MDM2 (p53 E3 ligase) tp53 TP53 (p53)
BOUNDARY NODES (4): atr hub in Curated v1.0 only chek1 hub in STRING v12.0 only pcna hub in STRING v12.0 only rb1 hub in Curated v1.0 only
All 5 hypotheses confirmed.
- Law in both sources:
- Curated: r(FA, PI) = 0.7493 (rank 1), r(FA, GI) = 0.7308 (rank 2)
- STRING: r(FA, PI) = 0.8689 (rank 1), r(co_exp, GI_proxy) = -0.257 (textmining proxy correctly decouples from the functional layer)
Certificate SHA-256: ae72b2310cec20b7...
Interpretation
The 5 trusted nodes - TP53, MDM2, ATM, BRCA1, CHK2 - are the structurally robust backbone of the p53 DNA-damage response network. They appear as hubs regardless of curation methodology: expert annotation or automated channel scoring. Any downstream analysis (drug target prioritization, pathway reconstruction, disease network modeling) can use this set with the highest confidence.
- The 4 boundary nodes are the scientifically interesting region:
- ATR, RB1 (curated-only hubs): expert curation captured replication stress and cancer genetics connections that STRING's channel thresholds rank lower. These are real signals - just data-source-dependent.
- CHEK1, PCNA (STRING-only hubs): CHEK1 was already identified as a structural outlier in a prior pre-registered experiment (it ranked as the shadow node - expected to be ATM, actually CHEK1). Its appearance as a STRING-only hub is consistent. PCNA's sliding clamp role is captured densely by STRING's co-complex scoring.
Significance
This is the first pre-registered structural trust certification in IRDME's public experiment record. The methodology is available via the /api/datasets/compare endpoint and the irdme certify CLI command. Any two datasets describing the same system from independent sources can now be certified.
The pre-registration record is public: github.com/vladi160/preregistrations.
Reproducibility
This result was pre-registered before analysis. SHA-256 hash: 5bbbd9cbebe29544e0f239708e384c85aa6f7f4d2a2eb39907b79c80a1d4b023
Verify at github.com/vladi160/preregistrations