Contramont Research investigates the failure modes of AI safety and security methods through a novel approach: building cryptographic model organisms — AI models with hidden behaviors backed by cryptographic hardness guarantees. Their flagship work demonstrated unelicitable backdoors in language models that evade detection even under full white-box access, fundamentally challenging the reliability of pre-deployment safety evaluation. The lab is a 501(c)(3) nonprofit founded in 2024 by MATS alumni, with a distributed team spanning Madison, Cambridge, and the San Francisco Bay Area.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $81,908
- Fiscal Sponsor
- -
Theory of Change
By constructing cryptographic model organisms — AI models with mathematically provable hidden behaviors — Contramont demonstrates the existence of fundamental limitations in current safety and security evaluation techniques. This provides empirical proof that methods like red-teaming, white-box inspection, and pre-deployment detection can be defeated in principle. By surfacing these failure modes, the research motivates the AI safety field to develop more robust evaluation guarantees before powerful AI systems are deployed, reducing the risk of undetected misalignment or adversarial manipulation in high-stakes settings.
Grants Received
No grants recorded.
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 21, 2026, 7:27 PM UTC
- Created
- Mar 19, 2026, 10:30 PM UTC