SAE features scale predictably with model size

summary

The number and quality of interpretable features extracted by sparse autoencoders increases predictably as model size grows, suggesting feature-level interpretability is not an artifact of small-scale experiments.

Anthropic (2024) demonstrated that SAE features found in small models have analogues in larger models, and that feature quality metrics improve with scale. This was independently confirmed across multiple model families.

trust profile

dimensions

evid

89%

repl

74%

cons

81%

meth

82%

cred

91%

scop

78%

brdg

28%

cont

8%

derived scores

supp

86%

fron

47%

stab

76%

claim_support_vector v1.0 · 2026-03-09 19:58 UTC

evidence 3

↑ supporting 2

supports · artifact

Scaling Monosemanticity (Anthropic, 2024)

The 2024 scaling paper demonstrates that SAE features found in small models have analogues in larger models with improving quality metrics.

Scaling Monosemanticity (Anthropic, 2024) · 91% — Feature scaling results

supports · artifact

Toy Models of Superposition (Elhage et al., 2022)

The 2022 toy models paper shows that superposition and polysemanticity emerge predictably as a function of feature dimensionality and sparsity.

Toy Models of Superposition (Elhage et al., 2022) · 88% — Superposition emergence in toy models

• asserting 1

asserts · artifact

Scaling Monosemanticity (Anthropic, 2024)

Primary assertion from Anthropic 2024.

scope

holds_in Large-scale language models (>1B parameters) model_scale

neighborhood 2

holds_in → Large-scale language models (>1B parameters) context

supports → Feature universality concept

attestations

Curator (Human) verifies 0.92

Claim wording accurately reflects the scaling monosemanticity findings.

Anthropic endorses 0.88

Anthropic endorses the scaling result.

domains

Mechanistic Interpretability 100%

view status

Strict Empirical included

computation trace

show raw trace data

{
  "note": "Well-replicated across multiple papers and model scales",
  "inputs": {
    "artifacts": 2,
    "attestations": 2,
    "disputing_edges": 0,
    "supporting_edges": 2
  }
}