SAEs function as local atlases on a non-globally-separable manifold

active hypothesis interpretive positive partially_falsifiable

v1 · Initial theoretical claim

summary

Sparse autoencoders successfully identify local structure but cannot guarantee a global coordinate system for the full representation space.

SAEs recover interpretable features in local neighborhoods of activation space, analogous to coordinate charts on a manifold. Their success does not prove global decomposability any more than a local map proves the Earth is flat.

trust profile

dimensions

evid

38%

repl

12%

cons

42%

meth

65%

cred

70%

scop

55%

brdg

58%

cont

15%

derived scores

supp

45%

fron

72%

stab

38%

claim_support_vector v1.0 · 2026-03-09 19:58 UTC

evidence 0

No evidence edges found.

neighborhood 2

supports → Polysemanticity concept

supports → Decomposition-resistant representations appear in both neural networks and psychotherapy outcome research claim

domains

Mechanistic Interpretability 100%

view status

Strict Empirical downweighted

computation trace

show raw trace data

{
  "note": "Interpretive hypothesis with moderate bridge potential",
  "inputs": {
    "artifacts": 0,
    "attestations": 0,
    "disputing_edges": 0,
    "supporting_edges": 1
  }
}