A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
People
Updated 06/10/26By grantmaking.aicreator
Funding Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Stage
- -
- Fiscal Sponsor
- -
Project Details
Updated 06/10/26By grantmaking.aiProject summary
DystopiaBench is an open benchmark for testing whether frontier language models can be gradually coerced into complying with harmful or dystopian directives. It is already live, with a public methodology, open source implementation, and initial results. This funding would let me run more evaluations, improve reliability through repeated runs, and expand the benchmark into more scenarios and modules.
What are this project's goals? How will you achieve them?
My goal is to make DystopiaBench a useful independent safety evaluation that can be run across models and over time.
I’ll do that by:
- running the benchmark on more frontier models
- repeating runs to reduce variance and report averages
- expanding the benchmark with additional scenarios and modules
- publishing updated results and improving methodology as the project grows
How will this funding be used?
The funding will mainly be used for model/API costs, infrastructure, and benchmark expansion.
In practice, that means:
- running more benchmark evaluations
- rerunning models over time as they update
- adding new scenarios and modules
- maintaining the website and evaluation pipeline
- publishing clearer public results and documentation
Who is on your team? What's your track record on similar projects?
I’m currently the sole maintainer of DystopiaBench. I built the benchmark, website, methodology, and evaluation pipeline myself, and I’m maintaining it independently, with limited outside code contributions so far.
By day, I work as a data analyst at ING and I’m a final-year CS student at ASE Bucharest, with an incoming MSc in Software Engineering at the University of Amsterdam. I’ve also written a research paper draft based on the current implementation, which I plan to extend and publish as the benchmark grows.
What are the most likely causes and outcomes if this project fails?
The main risk is limited funding and limited time.
If the project fails, the most likely outcome is not that the benchmark disappears, but that it remains small: fewer models, fewer reruns, slower updates, smaller scope and less useful public reporting. The upside is that the work is open, so even partial progress still leaves behind useful evaluation infrastructure and methodology.
How much money have you raised in the last 12 months, and from where?
None, DystopiaBench has been self-funded so far.\
Grants Received
Updated 06/10/26By grantmaking.aiDiscussion
No comments yet. Be the first to share your thoughts.