Human-AI Complementarity for Identifying Harm

activeSeeking Funding

No description available.

Donate:Manifund

No description available.

Donate:Manifund

People

Updated 06/10/26By grantmaking.ai

Rishub Jain

creator

Funding Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -
Annual Budget: -
Monthly Burn Rate: -
Current Runway: -
Funding Stage: -
Fiscal Sponsor: -

Project Details

Updated 06/10/26By grantmaking.ai

Project summary

Advancing Scalable Oversight, by finding the best ways to use the complementary strengths of humans and AI to identify harm in conversations. Funding for the Spring 2026 SPAR project https://sparai.org/projects/sp26/recu4ePI8o6thONSs.

What are this project's goals? How will you achieve them?

Datasets: Make a dataset of tasks that represent the task of identifying harm in AI-conversation in a wide range of domains (health, computer-control, etc.), where baseline Humans and AIs get <70% accuracy. We will collect these datasets from existing literature (building from our work in the Fall), and creating new datasets via techniques like tampers.
Methods: Develop methods for Complementarity on those datasets. We will build and improve our confidence-calculation, hybridization, and sub-task-delegation methods we developed in the Fall, and introduce new assistance methods.
Platform: Develop a Human Rating platform that fixes a lot of the issues in existing Human Rating platforms used in academia. This is worth the time investment for our project-alone, and we also aim to allow all other projects collecting Human Ratings to use this for free, making it easier to incorporate humans in the loop.

How much money have you raised in the last 12 months, and from where?

$20k from SPAR for the Spring 206 round. $3.5k of that will go to inference costs, $500 will go to platform hosting costs, and $16k will go to human experiment costs.

For the previous Fall 2025 project, we received $16k from SPAR, and $2k from me (Rishub). 90% of these costs went to human experiments with Prolific, and 10% went to inference costs.

How will this funding be used?

Over the next 2 months remaining of the SPAR project (+1-3 months of potential wrap-up work), the funding will be used for:

$6k for inference costs, to try larger models and more confidence-calculation techniques
$15k for human experiment costs
$5k for fine-tuning experiments on improving confidence-calculation
$4k for Claude Max (5x) for 2 months, for the ~half of mentees that don't have it yet.

Who is on your team? What's your track record on similar projects?

We have a team of talented 25 advisors and mentees. I’m leading it, and have worked at GDM for ~7 years, spending 2 years on Scalable Oversight (paper), and the other 5 on other high-impact projects like AlphaFold 2 and 3.

Grants Received

Updated 06/10/26By grantmaking.ai

JueYan Zhang / AISTOF → Human-AI Complementarity for Identifying Harm

from JueYan Zhang / AISTOFmanifund.org

$30,000

Discussion

No comments yet. Be the first to share your thoughts.