Contramont Research is a nonprofit AI safety lab that studies where safety and security evaluation methods break down, using cryptographic model organisms to expose fundamental limitations of existing techniques.
Contramont Research is a nonprofit AI safety lab that studies where safety and security evaluation methods break down, using cryptographic model organisms to expose fundamental limitations of existing techniques.
People
Updated 05/18/26Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $81,908
Org Details
Updated 05/18/26Contramont Research is a nonprofit AI safety research laboratory founded in 2024 and registered in Lexington, Massachusetts, with researchers distributed across Madison, Wisconsin, Cambridge, and the San Francisco Bay Area. The lab studies where AI safety and security methods break, with a particular focus on constructing cryptographic model organisms: AI models with hidden behaviors and cryptographic hardness guarantees that reveal fundamental limitations of existing safety evaluation techniques. The organization's flagship publication, "Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits," was presented at NeurIPS 2024. This paper introduced a class of backdoors that are unelicitable — meaning the defender cannot trigger them to observe their behavior — making it impossible to evaluate or detect these backdoors ahead of deployment even with full white-box access and automated techniques such as red-teaming or formal verification. The research demonstrates that seamlessly integrating such backdoors into transformer models is feasible, fundamentally questioning the efficacy of pre-deployment detection strategies. Contramont's researchers have also collaborated on broader AI safety and evaluation efforts, including the REAL benchmark (NeurIPS 2025), Humanity's Last Exam (Nature 2025), and SynLlama (ACS Central Science 2025). Forthcoming work includes cryptographic sandbagging, evading runtime monitoring, and compiled model obfuscation. The lab is led by Andrew Gritsevskiy (President, PhD candidate at UW-Madison), Andis Draguns (Secretary), and Jeffrey Ladish (Director), with additional researchers including Sumeet Ramesh Motwani. Funding has come primarily from the Long-Term Future Fund. In its first fiscal year (ending June 2025), the organization reported $81,908 in revenue, nearly all from contributions.
Theory of Change
Updated 05/18/26By constructing cryptographic model organisms — AI models with mathematically provable hidden behaviors — Contramont demonstrates the existence of fundamental limitations in current safety and security evaluation techniques. This provides empirical proof that methods like red-teaming, white-box inspection, and pre-deployment detection can be defeated in principle. By surfacing these failure modes, the research motivates the AI safety field to develop more robust evaluation guarantees before powerful AI systems are deployed, reducing the risk of undetected misalignment or adversarial manipulation in high-stakes settings.
Grants Received– no grants recorded
Updated 05/18/26Projects
Updated 05/18/26Research project culminating in a NeurIPS 2024 paper that constructs cryptographic backdoors in transformer language models which remain unelicitable and very hard to detect or evaluate, even with full white-box access, challenging standard AI safety evaluation methods.
Discussion
No comments yet. Be the first to share your thoughts.