← All posts
biology#p53#proteins#pre-registration#trust-certification#STRING#BioGRID#hub-shadow

Pre-registered and confirmed: structural trust certification of the p53 network across two independent databases

We pre-registered 5 structural hypotheses about the p53 protein interaction network before running any analysis - then certified agreement between a curated BioGRID/DepMap dataset and STRING v12.0. All 5 confirmed. TP53, MDM2, ATM, BRCA1, and CHK2 are structurally robust across independent sources. CHEK1 and PCNA are boundary cases - hub in STRING only.

pre-reg: 5bbbd9cbebe29544

Background

The Functional Proximity Law holds across 16 independent domains: hub rank is correlated across layers when both layers measure the same underlying system. The next question is: does hub rank agree across independent sources that measure the same system by different methods?

This is the question Dataset Trust Certification answers.

Setup

    We compared two protein interaction datasets for the p53 DNA-damage response network:
  • Source A - Curated v1.0 (BioGRID/DepMap): expert-curated physical interactions, co-expression, functional associations, and genetic interactions for 15 proteins
  • Source B - STRING v12.0: automated channel scores (physical, co-expression, functional, genetic proxy) for the same 15 proteins

Both datasets contain the same 15 nodes. The question is whether the same nodes rank as hubs in both sources.

Pre-registration

Before running any analysis, we committed 5 hypotheses to a public pre-registration repository with a SHA-256 hash of the prediction file:

hash: 5bbbd9cbebe29544e0f239708e384c85aa6f7f4d2a2eb39907b79c80a1d4b023
timestamp: 2026-05-14T17:47:08.403592+00:00
    The 5 hypotheses:
  • h1 - TP53 will appear in the trusted node set (hub in both sources)
  • h2 - MDM2 will appear in the trusted node set
  • h3 - The overall verdict will be PARTIAL (agreement score 0.4-0.7)
  • h4 - The Functional Proximity Law will hold in both sources independently, with r(functional_association, physical_interaction) as the top-ranked layer pair in both
  • h5 - PCNA will appear in the boundary node set (hub in STRING only, not in the curated set)

The reasoning for h5: STRING's automated channel scoring captures co-complex membership densely (PCNA is a sliding clamp binding many repair proteins), while the curated set focused on the core p53-MDM2-ATM signaling axis.

Results

Running irdme certify on both graphs:

Nodes A: 15   Nodes B: 15   Shared: 15
Top-7 hub Jaccard: 0.5556   Verdict: PARTIAL

TRUSTED NODES (5): atm ATM (DSB sensor kinase) brca1 BRCA1 (DNA repair scaffold) chek2 CHK2 (G1/S checkpoint kinase) mdm2 MDM2 (p53 E3 ligase) tp53 TP53 (p53)

BOUNDARY NODES (4): atr hub in Curated v1.0 only chek1 hub in STRING v12.0 only pcna hub in STRING v12.0 only rb1 hub in Curated v1.0 only

All 5 hypotheses confirmed.

    Law in both sources:
  • Curated: r(FA, PI) = 0.7493 (rank 1), r(FA, GI) = 0.7308 (rank 2)
  • STRING: r(FA, PI) = 0.8689 (rank 1), r(co_exp, GI_proxy) = -0.257 (textmining proxy correctly decouples from the functional layer)

Certificate SHA-256: ae72b2310cec20b7...

Interpretation

The 5 trusted nodes - TP53, MDM2, ATM, BRCA1, CHK2 - are the structurally robust backbone of the p53 DNA-damage response network. They appear as hubs regardless of curation methodology: expert annotation or automated channel scoring. Any downstream analysis (drug target prioritization, pathway reconstruction, disease network modeling) can use this set with the highest confidence.

    The 4 boundary nodes are the scientifically interesting region:
  • ATR, RB1 (curated-only hubs): expert curation captured replication stress and cancer genetics connections that STRING's channel thresholds rank lower. These are real signals - just data-source-dependent.
  • CHEK1, PCNA (STRING-only hubs): CHEK1 was already identified as a structural outlier in a prior pre-registered experiment (it ranked as the shadow node - expected to be ATM, actually CHEK1). Its appearance as a STRING-only hub is consistent. PCNA's sliding clamp role is captured densely by STRING's co-complex scoring.

Significance

This is the first pre-registered structural trust certification in IRDME's public experiment record. The methodology is available via the /api/datasets/compare endpoint and the irdme certify CLI command. Any two datasets describing the same system from independent sources can now be certified.

The pre-registration record is public: github.com/vladi160/preregistrations.

Reproducibility

This result was pre-registered before analysis. SHA-256 hash: 5bbbd9cbebe29544e0f239708e384c85aa6f7f4d2a2eb39907b79c80a1d4b023

Verify at github.com/vladi160/preregistrations