HCAST: Human-Calibrated Autonomy Software Tasks

active

About

Updated 05/18/26

Benchmark of 189 machine-learning, cybersecurity, software engineering, and reasoning tasks with over 1,500 hours of human baselines, used to evaluate autonomous AI agents and to which EquiStamp-affiliated researchers contributed alongside METR.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support EquiStamp.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -