Andrew Gritsevskiy
Bio
Andrew Gritsevskiy is an AI safety researcher and entrepreneur based in San Francisco, California. He co-founded Contramont Research, a nonprofit AI safety lab focused on cryptographic model organisms and understanding where safety and security methods break, and co-founded Cavendish Labs, a Vermont-based research institute addressing AI safety and pandemic prevention. He was a PhD student in computer science at the University of Wisconsin-Madison before leaving to co-found RunRL (Y Combinator Spring 2025), a reinforcement learning platform. He participated in the MATS (ML Alignment Theory Scholars) program and has been affiliated with FAR.AI. His most notable research contribution is the paper "Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits" (NeurIPS 2024), which demonstrated backdoors in transformer models that cannot be triggered or detected even with full white-box access, fundamentally challenging the efficacy of pre-deployment safety evaluations. He also won Third Prize in the Inverse Scaling Prize competition for work on prompt injection.
Links
- Personal Website
- https://andrew.gr/
- Twitter / X
- LessWrong
- agg
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 2:16 PM UTC
- Created
- Mar 20, 2026, 2:47 AM UTC