Andrew Gritsevskiy
Bio
Updated 03/22/26Andrew Gritsevskiy is an AI safety researcher and entrepreneur based in San Francisco, California. He co-founded Contramont Research, a nonprofit AI safety lab focused on cryptographic model organisms and understanding where safety and security methods break, and co-founded Cavendish Labs, a Vermont-based research institute addressing AI safety and pandemic prevention. He was a PhD student in computer science at the University of Wisconsin-Madison before leaving to co-found RunRL (Y Combinator Spring 2025), a reinforcement learning platform. He participated in the MATS (ML Alignment Theory Scholars) program and has been affiliated with FAR.AI. His most notable research contribution is the paper "Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits" (NeurIPS 2024), which demonstrated backdoors in transformer models that cannot be triggered or detected even with full white-box access, fundamentally challenging the efficacy of pre-deployment safety evaluations. He also won Third Prize in the Inverse Scaling Prize competition for work on prompt injection.
Community Signal
Updated 03/22/26No endorsements yet.
Links
Updated 03/22/26- Personal Website
- https://andrew.gr/
- Twitter / X
- LessWrong
- agg
- EA Forum
- -