idea-graph
A knowledge graph where claims, evidence, and trust are computable. Every score traces to inputs. Nothing is silently overwritten.
claims: 7 branch: main

Individual neurons rarely encode single concepts
empirical · contested · evidential: 72% · replication: 55%
In standard transformers, most neurons respond to multiple unrelated concepts (polysemanticity), and single-concept neurons are rare exceptions rather than the norm.
evid
72%
repl
55%
cons
48%
meth
71%
cred
82%
scop
45%
brdg
22%
cont
58%
Decomposition-resistant representations appear in both neural networks and psychotherapy outcome research
bridging · hypothesis · evidential: 45% · replication: 18%
Both neural network representations and psychotherapy outcomes resist decomposition into independent factors, suggesting a shared structural property of complex adaptive systems.
evid
45%
repl
18%
cons
35%
meth
55%
cred
68%
scop
40%
brdg
82%
cont
31%
SAE features scale predictably with model size
empirical · replicated · evidential: 89% · replication: 74%
The number and quality of interpretable features extracted by sparse autoencoders increases predictably as model size grows, suggesting feature-level interpretability is not an artifact of small-scale experiments.
evid
89%
repl
74%
cons
81%
meth
82%
cred
91%
scop
78%
brdg
28%
cont
8%
Circuit-level analysis provides causal evidence that feature-level analysis cannot
methodological · tested · evidential: 76% · replication: 48%
While SAE-based feature analysis identifies what a model represents, circuit-level analysis reveals how computations flow, providing causal evidence about model behavior that feature-level analysis alone cannot establish.
evid
76%
repl
48%
cons
62%
meth
88%
cred
79%
scop
72%
brdg
35%
cont
18%
Polysemanticity can persist without superposition
empirical · tested · evidential: 81% · replication: 30%
Models prefer polysemantic neurons even when monosemantic solutions are available, because polysemantic representations achieve lower cross-entropy loss.
evid
81%
repl
30%
cons
100%
meth
100%
cred
0%
scop
100%
brdg
0%
cont
0%
Representation engineering offers an alternative to mechanistic decomposition
methodological · tested · evidential: 68% · replication: 35%
Rather than decomposing models into interpretable parts, representation engineering identifies and manipulates high-level concepts directly in activation space, offering control without requiring full mechanistic understanding.
evid
68%
repl
35%
cons
52%
meth
72%
cred
75%
scop
60%
brdg
45%
cont
28%
SAEs function as local atlases on a non-globally-separable manifold
interpretive · hypothesis · evidential: 38% · replication: 12%
Sparse autoencoders successfully identify local structure but cannot guarantee a global coordinate system for the full representation space.
evid
38%
repl
12%
cons
42%
meth
65%
cred
70%
scop
55%
brdg
58%
cont
15%