SAEs function as local atlases on a non-globally-separable manifold

active hypothesis interpretive positive partially_falsifiable
v1 · Initial theoretical claim

summary
Sparse autoencoders successfully identify local structure but cannot guarantee a global coordinate system for the full representation space.
SAEs recover interpretable features in local neighborhoods of activation space, analogous to coordinate charts on a manifold. Their success does not prove global decomposability any more than a local map proves the Earth is flat.
trust profile
dimensions
evid
38%
repl
12%
cons
42%
meth
65%
cred
70%
scop
55%
brdg
58%
cont
15%
derived scores
supp
45%
fron
72%
stab
38%
claim_support_vector v1.0 · 2026-03-09 19:58 UTC
evidence 0
No evidence edges found.
neighborhood 2
domains
Mechanistic Interpretability 100%
view status
Strict Empirical downweighted
computation trace
show raw trace data
{
  "note": "Interpretive hypothesis with moderate bridge potential",
  "inputs": {
    "artifacts": 0,
    "attestations": 0,
    "disputing_edges": 0,
    "supporting_edges": 1
  }
}