Neel Nanda

London, United Kingdom

https://www.neelnanda.io/

established1 personFounded 2022

Neel Nanda leads the mechanistic interpretability team at Google DeepMind, working to reverse-engineer the algorithms and structures learned by neural networks as a path to making AI safe. He is the creator of TransformerLens, an open-source library that dramatically lowered the barrier to mechanistic interpretability research and helped seed the field. Beyond his research, Neel mentors approximately 60 researchers through the MATS (ML Alignment & Theory Scholars) program and produces educational content including YouTube walkthroughs and a comprehensive interpretability glossary. He was recognized on MIT Technology Review's Innovators Under 35 list in 2025.

Funding Details

Annual Budget: -
Monthly Burn Rate: -
Current Runway: -
Funding Goal: -
Funding Raised to Date: -
Fiscal Sponsor: -

Theory of Change

Neel Nanda believes that mechanistic interpretability — reverse-engineering the algorithms and structures that neural networks learn — is a key pathway to ensuring AI safety. By understanding what is actually happening inside AI models, researchers can detect deception, identify dangerous capabilities, and build reliable monitoring systems before advanced AI causes harm. He multiplies this impact in two ways: by creating open-source tools (TransformerLens) that lower the barrier for other researchers to do interpretability work, and by mentoring dozens of researchers through the MATS program, thereby growing the field's capacity. His pragmatic shift focuses on translating interpretability insights into concrete safety applications such as hallucination detectors and behavioral monitors that work at deployment scale.

Grants Received

Interpretability Research

from Open Philanthropy

$70,000

Projects

No linked projects.

People

No linked people.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Last Updated: Apr 2, 2026, 9:53 PM UTC
Created: Mar 20, 2026, 2:34 AM UTC