idea-graph

Individual neurons rarely encode single concepts

empirical · contested · evidential: 72% · replication: 55%

In standard transformers, most neurons respond to multiple unrelated concepts (polysemanticity), and single-concept neurons are rare exceptions rather than the norm.

evid

72%

repl

55%

cons

48%

meth

71%

cred

82%

scop

45%

brdg

22%

cont

58%

↑2↓1

open →

Decomposition-resistant representations appear in both neural networks and psychotherapy outcome research

bridging · hypothesis · evidential: 45% · replication: 18%

Both neural network representations and psychotherapy outcomes resist decomposition into independent factors, suggesting a shared structural property of complex adaptive systems.

evid

45%

repl

18%

cons

35%

meth

55%

cred

68%

scop

40%

brdg

82%

cont

31%

↑2

open →

SAE features scale predictably with model size

empirical · replicated · evidential: 89% · replication: 74%

The number and quality of interpretable features extracted by sparse autoencoders increases predictably as model size grows, suggesting feature-level interpretability is not an artifact of small-scale experiments.

evid

89%

repl

74%

cons

81%

meth

82%

cred

91%

scop

78%

brdg

28%

cont

8%

↑2

open →

Circuit-level analysis provides causal evidence that feature-level analysis cannot

methodological · tested · evidential: 76% · replication: 48%

While SAE-based feature analysis identifies what a model represents, circuit-level analysis reveals how computations flow, providing causal evidence about model behavior that feature-level analysis alone cannot establish.

evid

76%

repl

48%

cons

62%

meth

88%

cred

79%

scop

72%

brdg

35%

cont

18%

↑1

open →

Polysemanticity can persist without superposition

empirical · tested · evidential: 81% · replication: 30%

Models prefer polysemantic neurons even when monosemantic solutions are available, because polysemantic representations achieve lower cross-entropy loss.

evid

81%

repl

30%

cons

100%

meth

100%

cred

0%

scop

100%

brdg

0%

cont

0%

↑1

open →

Representation engineering offers an alternative to mechanistic decomposition

methodological · tested · evidential: 68% · replication: 35%

Rather than decomposing models into interpretable parts, representation engineering identifies and manipulates high-level concepts directly in activation space, offering control without requiring full mechanistic understanding.

evid

68%

repl

35%

cons

52%

meth

72%

cred

75%

scop

60%

brdg

45%

cont

28%

↑1

open →

SAEs function as local atlases on a non-globally-separable manifold

interpretive · hypothesis · evidential: 38% · replication: 12%

Sparse autoencoders successfully identify local structure but cannot guarantee a global coordinate system for the full representation space.

evid

38%

repl

12%

cons

42%

meth

65%

cred

70%

scop

55%

brdg

58%

cont

15%

open →