Polysemanticity can persist without superposition

summary

Models prefer polysemantic neurons even when monosemantic solutions are available, because polysemantic representations achieve lower cross-entropy loss.

Anthropic (2023) showed that training models with 1-hot activations (eliminating superposition) does not eliminate polysemanticity. Models achieve better performance by making neurons polysemantic even when there is no superposition pressure.

trust profile

dimensions

evid

81%

repl

30%

cons

100%

meth

100%

cred

0%

scop

100%

brdg

0%

cont

0%

derived scores

supp

66%

fron

28%

stab

72%

claim_support_vector v1.0 · 2026-03-09 13:27 UTC

evidence 2

↑ supporting 1

supports · artifact

Towards Monosemanticity (Anthropic, 2023)

The Anthropic 2023 paper provides empirical support for the claim.

Towards Monosemanticity (Anthropic, 2023) · 81% — Support link from results section

• asserting 1

asserts · artifact

Towards Monosemanticity (Anthropic, 2023)

The Anthropic 2023 paper asserts the claim that polysemanticity can persist without superposition.

Towards Monosemanticity (Anthropic, 2023) · 86% — Direct extraction from paper discussion

evidence bundles

Evidence for polysemanticity-without-superposition

weighted_sum

Polysemanticity can persist without superposition → holds_in w=0.93

Towards Monosemanticity (Anthropic, 2023) → supports w=0.81

scope

holds_in Transformer cross-entropy training training_regime

neighborhood 1

holds_in → Transformer cross-entropy training context

attestations

Curator (Human) verifies 0.9

Claim wording acceptable for scoped empirical use.

domains

Mechanistic Interpretability 100%

view status

Strict Empirical included

computation trace

show raw trace data

{
  "formula": "claim_support_vector",
  "formula_id": "80000000-0000-0000-0000-000000000001",
  "raw_inputs": {
    "assert_count": 1,
    "dispute_count": 0,
    "support_count": 1,
    "fails_in_count": 0,
    "holds_in_count": 1,
    "total_evidence": 2,
    "cross_domain_edges": 0,
    "avg_method_formalism": 1,
    "avg_source_reputation": 0,
    "provenance_edge_count": 1,
    "dispute_attestation_count": 0,
    "total_attestation_strength": 0.9,
    "provenance_weighted_support": 0.81,
    "unique_supporting_artifacts": 1,
    "supporting_attestation_strength": 0.9
  },
  "computed_at": "2026-03-09T13:27:03.528Z",
  "intermediate": {
    "scope_total": 1,
    "dispute_edges": 0,
    "total_evidence": 2,
    "dispute_attestations": 0,
    "provenance_edge_count": 1,
    "provenance_weighted_support": 0.81
  },
  "view_node_id": null,
  "formula_version": "1.0"
}