People
Updated 05/18/26Science lead and researcher (mechanistic interpretability)
Researcher (interpretability) and Neuronpedia lead
Co-founder and former science lead
Researcher and SAELens maintainer
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Decode Research is an AI safety research infrastructure nonprofit founded at the end of 2023 by Joseph Bloom and Johnny Lin. The organization's mission is to improve understanding of AI models and accelerate interpretability research, with the ultimate goal of mitigating risks from future AI systems. Decode Research maintains several key open-source projects central to the mechanistic interpretability research ecosystem. Neuronpedia is an open interpretability platform that serves as a central hub for hosting, testing, visualizing, and understanding Sparse Autoencoders (SAEs) for large language model interpretability. It hosts the world's first interpretability API (launched March 2024) and is used by researchers at major AI labs including Anthropic, Google DeepMind, and OpenAI, as well as independent researchers and academic institutions. SAELens is a popular open-source Python library for training and analyzing Sparse Autoencoders on language models, with over 1,200 GitHub stars and 64 contributors. The circuit-tracer tool enables finding circuits using features from transcoders, developed in collaboration with Anthropic's Fellows program. The organization is led by Johnny Lin, an ex-Apple engineer who previously founded privacy startups. Key contributors include David Chanin, who maintains SAELens, and Michael Hanna, who maintains circuit-tracer. The organization collaborates with major AI labs including Anthropic, Google DeepMind, and OpenAI, as well as independent researchers and academic and nonprofit research organizations. Decode Research operates as a project of Players Philanthropy Fund, Inc., and has received support from multiple funders in the AI safety ecosystem including Open Philanthropy, the Long-Term Future Fund, AISTOF, Anthropic, the Survival and Flourishing Fund, and Manifund. In 2024, SFF recommended a $133,000 grant for general support of Decode Research. Neuronpedia, the organization's flagship platform, has grown from an initial crowdsourcing tool for neuron explanations into a comprehensive research platform for Sparse Autoencoders. In May 2025, Anthropic collaborated with Decode Research on an interactive Neuronpedia interface for circuit tracing, further establishing the organization as critical infrastructure for the interpretability research community.
Theory of Change
Updated 05/18/26Decode Research believes that mechanistic interpretability, the science of reverse-engineering neural networks to understand their computational mechanisms, is essential for AI safety. By building free, open-source research infrastructure and platforms, they lower the barriers to entry for interpretability research, enabling more researchers worldwide to contribute to understanding AI internals. As AI systems become more powerful, this understanding will be critical for aligning them with human values and detecting potentially dangerous behaviors before deployment. Their tools allow researchers to train, visualize, and analyze Sparse Autoencoders, which decompose neural network activations into interpretable features, providing a path toward making AI models more transparent, understandable, editable, and safer.
Grants Received
Updated 05/18/26Projects
Updated 05/18/26circuit-tracer is an open-source library for finding, visualizing, and intervening on feature circuits in language models using sparse autoencoder and transcoder features, developed by Anthropic fellows and maintained in collaboration with Decode Research.
Neuronpedia is an open-source interpretability platform for exploring, analyzing, and steering the internal features of AI language models. It serves as the primary public infrastructure for mechanistic interpretability research, particularly around sparse autoencoders (SAEs).
SAELens is an open-source Python library from Decode Research for training sparse autoencoders on language models and analyzing their features to support mechanistic interpretability research.
Discussion
No comments yet. Be the first to share your thoughts.