Technical Alignment Impossibility Proofs

Los Angeles, California, United States

1 personFounded 2022

An independent research project focused on proving formal impossibility results in AI alignment using theoretical computer science methods, led by Alexander Bistagne as a Ronin Institute Fellow.

People– no linked people

Updated 03/19/26

Funding Details

Updated 03/19/26

Annual Budget: -
Current Runway: -
Funding Goal: -
Funding Raised to Date: $176,070

Org Details

Updated 03/19/26

Technical Alignment Impossibility Proofs is an independent research project led by Alexander Bistagne, a 2021 UCSC graduate and Fellow of the Ronin Institute for Independent Scholarship. The project received a $170,000 grant from the Survival and Flourishing Fund (SFF) in the 2022-H2 S-Process grant round, funded by Jaan Tallinn, with the Ronin Institute for Independent Scholarship Incorporated serving as the receiving charity and fiscal sponsor. The project's central research direction, titled "Alignment is Hard," seeks to formalize the argument that AI alignment testing is computationally intractable. The core claim is that if we cannot prove a program will loop forever, we cannot prove an agent will care about us forever. More formally, under specific conditions -- when an agent's environment can be modeled with discrete time, the agent's architecture is agentically-Turing complete, and the agent's code is immutable -- testing the agent's alignment is CoRE-Hard. Bistagne posted his paper "Alignment is Hard: An Uncomputable Alignment Problem" on LessWrong in November 2023. The paper was submitted to and rejected from the Alignment Forum. He also received a smaller Manifund grant of approximately $6,070 for the same line of research. A second research direction, "Control By Committee," explores alignment of ensembles of agents through multi-agent structures, an idea Bistagne had been developing since 2018-2019. The project's Manifund page was subsequently closed, with Bistagne stating he could not in good faith ask for more money without an environment conducive to AI safety research. As of late 2025, the project appears to be inactive. The original Ronin Institute for Independent Scholarship (NJ) dissolved in September 2024, though a successor organization (RIIS 2.0) was incorporated in Sacramento, California in April 2024, where Bistagne remains listed as a Fellow.

Theory of Change

Updated 03/19/26

The project's theory of change is that establishing formal impossibility results in AI alignment can redirect the field's efforts more productively. By proving that certain approaches to alignment testing are computationally intractable or undecidable, the research aims to demonstrate that aligning black-box AI agents may be fundamentally impossible, potentially convincing researchers to focus on alternative approaches such as designing AI architectures that are provably aligned by construction rather than attempting post-hoc alignment verification.

Grants Received

Updated 03/19/26

SFF-2022-H2 - Technical Alignment Impossibility Proofs

from Survival and Flourishing Fundsurvivalandflourishing.fund

$170,000

Projects– no linked projects

Updated 03/19/26

Discussion

No comments yet. Be the first to share your thoughts.