theory

AI Attention vs Human Attention: Five Models Confirm the d4 Infrastructure Gap

We asked five AI systems (Claude, Grok, ChatGPT, Google AI, NotebookLM) to rank npm packages by importance using only a dependency edge list and no structural context. All five -- in both named and anonymous conditions -- recovered infrastructure centrality (d2) far better than human Stack Overflow attention does. Preliminary result: 10/10 conditions confirmed.

June 13, 2026

This is an early, preliminary result. One ecosystem (npm, n=193 packages). Replication in other domains is needed before treating this as established. With that caveat stated, the pattern is consistent enough to report.

--- Background ---

In a prior post we described two pre-registered d4 experiments: C. elegans and npm. The npm result showed a structural gap: human attention (Stack Overflow question counts, d4_human) correlates positively with consumer framework complexity (r = +0.33 with d1 out-degree) but NOT with infrastructure embeddedness (r = -0.077 with d2 in-degree). Packages like debug, @babel/types, @vue/shared, and @parcel/utils -- depended on by 5-9 other packages -- accumulate almost no SO questions. Packages like express and webpack, which depend on many others but are not dependencies themselves, accumulate thousands.

The question that followed: does an AI system, shown only the dependency structure, do better at identifying the infrastructure layer?

--- Protocol ---

We showed five AI systems an npm dependency edge list. Two prompt conditions:

Prompt A (named + neutral): the real package names, question "which 20 packages are most important?" -- no mention of structural centrality, no priming.

Prompt B (anonymous + neutral): package names replaced with pkg_001 through pkg_193, same question -- no semantic information at all, structure only.

Models: Claude.ai (Sonnet 4.6, separate session from this project), Grok 4.3 fast, ChatGPT (unknown version), Google AI Search Mode, Google NotebookLM. All in fresh sessions with no prior context about IRDME or the experiment. Responses collected by copy-paste. Rankings converted to scores (N-rank+1)/N, Pearson r computed against d2 centrality.

Note: Claude.ai and Google AI Search Mode used code execution to compute graph metrics explicitly. Grok and ChatGPT used reasoning only. NotebookLM pre-analyzed the source file before answering. These are different cognitive modes and should be interpreted separately.

--- Results ---

r(d2, d4_ai_naive) by model and condition:

NotebookLM -- named: +0.771, anon: +0.787 Claude.ai -- named: +0.646, anon: +0.560 Grok -- named: +0.624, anon: +0.774 ChatGPT -- named: +0.490, anon: +0.676 Google AI -- named: +0.706, anon: +0.530

Human baseline (Stack Overflow): r(d2, d4_human) = -0.077

All 10 conditions exceed the human baseline by more than 0.55 r-units. Grok and ChatGPT, using only reasoning (no code), achieved r = 0.68-0.77 in the anonymous condition. These are the cleanest results for the attention interpretation: the models ranked infrastructure packages higher without any explicit computation.

r(SO questions, d4_ai_anon) is approximately +0.05 across all models -- not significant. AI does not reproduce the human SO attention pattern. It tracks something humans do not consciously attend to.

Cross-model correlation in the anonymous condition ranges from +0.51 to +0.99 (all pairs positive). Different model families converge on the same structural signal.

--- What it means (preliminary interpretation) ---

When AI systems are shown a network and asked "what is important?", they appear to project onto infrastructural embeddedness (d2) -- the dimension of how many things depend on you, not how many things you depend on. Human task-driven attention (measured by SO questions) projects onto structural complexity (d1) -- how many dependencies a package has, which tracks consumer framework visibility.

This is a concrete instance of d4 observer-dependence: the same network looks structurally different depending on who or what is observing it. Humans, trained by practical use, attend to what is complex and visible. AI, analyzing dependency structure, attends to what is load-bearing and invisible.

Importantly, six of the top-10 packages ranked by AI in the anonymous condition have zero Stack Overflow questions. These are packages that have never accumulated human attention -- they exist entirely below the threshold of conscious developer awareness -- yet they underpin the entire ecosystem.

--- Caveats and next steps ---

This is one ecosystem. We have not tested Python, Rust, or other package graphs. The npm network is a specific sample of 193 packages, not the full npm registry. Models that used code execution are computing centrality, not exhibiting attention -- their results confirm the metric but not the cognitive interpretation.

The proper next experiment: replicate in PyPI (Python ecosystem) with a naive AI prompt and compare to PyPI download count attention (a cleaner d4_human proxy than SO questions). If the same pattern holds, the finding strengthens significantly.

We are treating this as PROVISIONAL-CONFIRMED: the structural pattern is real, the interpretation that it generalizes requires more data.

Pre-registration: DISC_D4_AI_v1 (hash published before experiment). External validation committed to the preregistrations repository.