People
Updated 06/10/26By grantmaking.aicreator
Funding Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Stage
- -
- Fiscal Sponsor
- -
Project Details
Updated 06/10/26By grantmaking.aiProject summary
3 months full-time contributing software to Inspect.\
What are this project's goals? How will you achieve them?
Goals:
- Make improvements to Inspect.
- Add more evals to Inspect.
- Test my fit as a software engineer for evals.
- Build career capital.
How I'll achieve them:
- Concrete ways:
- Port benchmarks not yet in Inspect (e.g. TheAgentCompany, RE-Bench, and MLGym).
- Develop Python packages implementing collections of Inspect solvers, tools, and scorers (e.g. like Inspect Cyber).
- Implement realistic test environments like WebArena for testing a wider range of agent scenarios in contained settings.
- Build tools for analyzing log files/reviewing transcripts to identify reasons for failure (if Docent isn’t doing all of this).
- Build tools for presenting collections of results in dashboards (i.e. contribute to https://github.com/ArcadiaImpact/inspect_evals_dashboard).
- Build tools for LM agents to use (e.g. search through https://github.com/aorwall/moatless-tools for tools which help/might be useful and build them in Inspect).
- Default ways/in general:
- Try to complete open issues in Inspect repos.
- Ask the developers in the Inspect Slack workspace how to contribute.\
How will this funding be used?
This is meant to replace as much of my salary in industry as possible (which would mean about $15,000 per month).\
Who is on your team? What's your track record on similar projects?
Just me. I maintain open source projects like SAELens, neuronpedia, and SAEDashboard.\
What are the most likely causes and outcomes if this project fails?
Causes:
- I don't:
- Ramp up on the codebase fast enough.
- Have enough work for 1 month full-time.
Outcomes:
- I don't:
- Make any significant improvements to Inspect.
- Add many evals to Inspect.
- Know my fit as a software engineer for evals.
- Build career capital.\
How much money have you raised in the last 12 months, and from where?
None.\
Grants Received
Updated 06/10/26By grantmaking.aiDiscussion
Adding another 10k after the first month has gone well. Anthony is making fast progress on Inspect. We've discussed plans for longer-term funding, and I'll be bridging the next month until other funding is confirmed.
[Progress update]
What progress have you made since your last update?
I worked on this project for 7 months instead of the 2 months I had written in the project, and prioritized building features and fixing bugs in inspect_ai. When I was blocked or didn’t have other things to work on, I completed open issues in inspect_evals.
I authored 56 merged PRs in Inspect repos:
- 28 in inspect_ai
- 25 in inspect_evals
- 2 in inspect_swe
- 1 in inspect_scout
Some features I contributed:
-
Support for Gemini Computer Use
-
Score editing
-
S3 conditional writes
-
Google agent bridge
-
Gemini CLI agent
-
Sandbox configuration using Compose files
-
API key refreshing
-
Adapter for Harbor evals
Examples I produced that researchers can refer to:
- Running Inspect evals (in Docker containers) inside an Inspect eval
- How to intercept, modify, and control HTTP requests made by an agent
Meridian Labs based their Inspect Harbor package on my Harbor task implementation.
I also developed my own package Inspect Modal Sandboxes. Meridian based their Modal Sandbox in their Inspect Sandboxes package on this.
I also wrote docs for features and looked into bugs, for which I didn’t author merged PRs.
What are your next steps?
-
Continue to work with Inspect maintainers on new, substantial features in Inspect repos.
-
Secure long-term funding.
Is there anything others could help you with?
Donating another 10k for the next month until long-term funding is confirmed.
I really want to continue doing this work and have been doing it unpaid for the past 5 months until I secure longer-term funding. I’ve updated the project for a third $10,000.
If any other grant meets my funding needs, I’ll coordinate to avoid double funding.
Anthony has done great work on other OS libraries for AI safety (SAELens, neuronpedia, and SAEDashboard) and I'm excited that he wants to explore building tools for evals.
I've talked to him about his plans and think it's an obviously good funding opportunity.