Circuit-level analysis provides causal evidence that feature-level analysis cannot

summary

While SAE-based feature analysis identifies what a model represents, circuit-level analysis reveals how computations flow, providing causal evidence about model behavior that feature-level analysis alone cannot establish.

Wang et al. (2023) demonstrated that identifying circuits (connected subgraphs of model components) enables causal interventions: ablating a circuit changes model behavior in predicted ways, while ablating individual features may not. This suggests feature decomposition is necessary but insufficient for mechanistic understanding.

trust profile

dimensions

evid

76%

repl

48%

cons

62%

meth

88%

cred

79%

scop

72%

brdg

35%

cont

18%

derived scores

supp

78%

fron

52%

stab

62%

claim_support_vector v1.0 · 2026-03-09 19:58 UTC

evidence 2

↑ supporting 1

supports · artifact

Circuit Analysis in GPT-2 (Wang et al., 2023)

Wang et al. demonstrate that circuits provide causal evidence through ablation studies.

Circuit Analysis in GPT-2 (Wang et al., 2023) · 89% — Circuit ablation results

• asserting 1

asserts · artifact

Circuit Analysis in GPT-2 (Wang et al., 2023)

Direct assertion from the circuit analysis paper.

neighborhood 1

supports → Circuit-level interpretability concept

attestations

Curator (Human) verifies 0.85

Methodological claim accurately distinguishes feature-level from circuit-level evidence.

domains

Mechanistic Interpretability 100%

view status

Strict Empirical included

computation trace

show raw trace data

{
  "note": "Strong methodological claim with high method quality",
  "inputs": {
    "artifacts": 1,
    "attestations": 1,
    "disputing_edges": 0,
    "supporting_edges": 1
  }
}