HCAST: Human-Calibrated Autonomy Software Tasks
active
About
Updated 05/18/26Benchmark of 189 machine-learning, cybersecurity, software engineering, and reasoning tasks with over 1,500 hours of human baselines, used to evaluate autonomous AI agents and to which EquiStamp-affiliated researchers contributed alongside METR.
Discussion
Sign in to comment
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -