circuit-tracer
About
Updated 05/18/26circuit-tracer is a collaborative mechanistic interpretability project that implements tools for discovering and analyzing feature circuits in large language models. Building on sparse autoencoders and transcoders, it identifies features that are causally responsible for particular outputs, assembles them into directed attribution graphs, and exposes APIs for visualizing and editing these circuits. The library was initially developed by Anthropic fellows and released as open source; it is maintained in close collaboration with Decode Research, which hosts the decoderesearch/circuit-tracer repository and provides a Neuronpedia integration so that researchers can run circuit tracing through an interactive web interface without needing local setup. circuit-tracer has been used in downstream work on circuit tracing in both language and vision-language models and is part of a broader ecosystem of interpretability tools around Neuronpedia and SAELens.
Theory of Change
circuit-tracer is based on the view that understanding model behavior requires moving beyond neuron-level saliency to structured, causal circuits over interpretable features. By making it easy for researchers to discover and inspect these feature circuits, run attribution analyses, and test targeted interventions, circuit-tracer aims to reveal how models implement behaviors such as multi-step reasoning or hallucination suppression. In combination with Neuronpedia and SAELens, this should help the alignment community build more reliable tools for auditing, debugging, and ultimately controlling advanced models.
Discussion
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -