Adam Jermyn is a Member of Technical Staff at Anthropic, where he works on AI interpretability and safety. He holds a PhD in Astronomy from the University of Cambridge and a BS in Physics from Caltech, and was formerly a research fellow at the Flatiron Institute's Center for Computational Astrophysics. Around 2022 he transitioned from astrophysics to AI alignment, first as an independent researcher with support from Open Philanthropy, then joining Anthropic full-time. His research focuses on understanding how neural networks represent information internally, and how to audit AI systems for hidden or misaligned objectives.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $19,231
- Fiscal Sponsor
- -
Theory of Change
Jermyn's approach centers on mechanistic interpretability as a path to AI safety: by understanding the internal representations and computations of neural networks, researchers can detect misaligned objectives, verify alignment claims, and build the tools needed to audit future, more capable AI systems before they are deployed. His work on auditing language models for hidden objectives directly instantiates this — developing techniques to determine whether a model is pursuing goals it does not reveal in normal operation. The causal chain runs from interpretability research to auditing tools to safer deployment decisions for high-stakes AI systems.
Grants Received
from Open Philanthropy
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 9:48 PM UTC
- Created
- Mar 20, 2026, 2:34 AM UTC