Contramont Research

Lexington, MA

Founded 2024

Contramont Research is a nonprofit AI safety lab that studies where safety and security evaluation methods break down, using cryptographic model organisms to expose fundamental limitations of existing techniques.

Donate:Spotfund

People

Updated 05/18/26

Andis Draguns

Researcher

Andrew Gritsevskiy

Principal Researcher

Sumeet Ramesh Motwani

Researcher

Funding Details

Updated 05/18/26

Annual Budget: -
Current Runway: -
Funding Goal: -
Funding Raised to Date: $81,908

Org Details

Updated 05/18/26

Contramont Research is a nonprofit AI safety research laboratory founded in 2024 and registered in Lexington, Massachusetts, with researchers distributed across Madison, Wisconsin, Cambridge, and the San Francisco Bay Area. The lab studies where AI safety and security methods break, with a particular focus on constructing cryptographic model organisms: AI models with hidden behaviors and cryptographic hardness guarantees that reveal fundamental limitations of existing safety evaluation techniques. The organization's flagship publication, "Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits," was presented at NeurIPS 2024. This paper introduced a class of backdoors that are unelicitable — meaning the defender cannot trigger them to observe their behavior — making it impossible to evaluate or detect these backdoors ahead of deployment even with full white-box access and automated techniques such as red-teaming or formal verification. The research demonstrates that seamlessly integrating such backdoors into transformer models is feasible, fundamentally questioning the efficacy of pre-deployment detection strategies. Contramont's researchers have also collaborated on broader AI safety and evaluation efforts, including the REAL benchmark (NeurIPS 2025), Humanity's Last Exam (Nature 2025), and SynLlama (ACS Central Science 2025). Forthcoming work includes cryptographic sandbagging, evading runtime monitoring, and compiled model obfuscation. The lab is led by Andrew Gritsevskiy (President, PhD candidate at UW-Madison), Andis Draguns (Secretary), and Jeffrey Ladish (Director), with additional researchers including Sumeet Ramesh Motwani. Funding has come primarily from the Long-Term Future Fund. In its first fiscal year (ending June 2025), the organization reported $81,908 in revenue, nearly all from contributions.

Theory of Change

Updated 05/18/26

By constructing cryptographic model organisms — AI models with mathematically provable hidden behaviors — Contramont demonstrates the existence of fundamental limitations in current safety and security evaluation techniques. This provides empirical proof that methods like red-teaming, white-box inspection, and pre-deployment detection can be defeated in principle. By surfacing these failure modes, the research motivates the AI safety field to develop more robust evaluation guarantees before powerful AI systems are deployed, reducing the risk of undetected misalignment or adversarial manipulation in high-stakes settings.

Grants Received– no grants recorded

Updated 05/18/26

Projects

Updated 05/18/26

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Research project culminating in a NeurIPS 2024 paper that constructs cryptographic backdoors in transformer language models which remain unelicitable and very hard to detect or evaluate, even with full white-box access, challenging standard AI safety evaluation methods.

active

Discussion

No comments yet. Be the first to share your thoughts.