AGI Inherent Non-Safety is a research initiative originating from AI Safety Camp that develops alternative designs for artificial general intelligence agents. Led by Dr. Jobst Heitzig at the Potsdam Institute for Climate Impact Research (PIK), the project (also known as SatisfIA) argues that AGI agents designed to maximize objective functions are inherently unsafe due to risks from Goodharting, misaligned optimization, and dangerous instrumental behavior. Instead, the project develops aspiration-based designs where agents fulfill goals specified via constraints rather than optimization targets, aiming for outcomes within acceptable ranges rather than maximizing any single metric.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- Players Philanthropy Fund
Theory of Change
The project's theory of change rests on the argument that optimization-based AGI designs are fundamentally unsafe regardless of how carefully the objective function is specified. By developing a viable alternative paradigm (aspiration-based, non-maximizing agent designs), the project aims to provide the AI development community and policymakers with concrete safer alternatives to current approaches. The causal chain runs from publishing academic papers establishing credibility, to developing software components demonstrable in toy environments, to partnering with industry on concrete applications, to creating proofs-of-concept, to ultimately providing regulators with viable safer alternatives to current AI development approaches. If successful, this would reduce existential risk by offering a technically sound path to capable AI systems that do not exhibit the dangerous instrumental behaviors and Goodharting failures inherent in optimization-based designs.
Grants Received
from Survival and Flourishing Fund
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 9:55 PM UTC
- Created
- Mar 18, 2026, 11:18 PM UTC