Neel Nanda leads the mechanistic interpretability team at Google DeepMind, working to reverse-engineer the algorithms and structures learned by neural networks as a path to making AI safe. He is the creator of TransformerLens, an open-source library that dramatically lowered the barrier to mechanistic interpretability research and helped seed the field. Beyond his research, Neel mentors approximately 60 researchers through the MATS (ML Alignment & Theory Scholars) program and produces educational content including YouTube walkthroughs and a comprehensive interpretability glossary. He was recognized on MIT Technology Review's Innovators Under 35 list in 2025.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- -
Theory of Change
Neel Nanda believes that mechanistic interpretability — reverse-engineering the algorithms and structures that neural networks learn — is a key pathway to ensuring AI safety. By understanding what is actually happening inside AI models, researchers can detect deception, identify dangerous capabilities, and build reliable monitoring systems before advanced AI causes harm. He multiplies this impact in two ways: by creating open-source tools (TransformerLens) that lower the barrier for other researchers to do interpretability work, and by mentoring dozens of researchers through the MATS program, thereby growing the field's capacity. His pragmatic shift focuses on translating interpretability insights into concrete safety applications such as hallucination detectors and behavioral monitors that work at deployment scale.
Grants Received
from Open Philanthropy
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 9:53 PM UTC
- Created
- Mar 20, 2026, 2:34 AM UTC