Andrew Gritsevskiy

San Francisco, California

Bio

Updated 03/22/26

Andrew Gritsevskiy is an AI safety researcher and entrepreneur based in San Francisco, California. He co-founded Contramont Research, a nonprofit AI safety lab focused on cryptographic model organisms and understanding where safety and security methods break, and co-founded Cavendish Labs, a Vermont-based research institute addressing AI safety and pandemic prevention. He was a PhD student in computer science at the University of Wisconsin-Madison before leaving to co-found RunRL (Y Combinator Spring 2025), a reinforcement learning platform. He participated in the MATS (ML Alignment Theory Scholars) program and has been affiliated with FAR.AI. His most notable research contribution is the paper "Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits" (NeurIPS 2024), which demonstrated backdoors in transformer models that cannot be triggered or detected even with full white-box access, fundamentally challenging the efficacy of pre-deployment safety evaluations. He also won Third Prize in the Inverse Scaling Prize competition for work on prompt injection.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2024 Q1 - Andrew Gritsevskiy

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$50,336

Andrew Gritsevskiy

Bio

Community Signal

Links

Grants