FindTheFlaws
About
Updated 05/18/26FindTheFlaws is a scalable oversight benchmark developed by Modulo Research and collaborators. The project assembles five challenging datasets drawn from existing benchmarks in medicine, mathematics, science, coding and the Lojban language, and augments them with expert-written solutions and annotations indicating exactly where reasoning goes wrong. Using these datasets, the team evaluates frontier language models on tasks such as grading solutions and identifying the first erroneous step in flawed reasoning, showing substantial variation across models and domains. This resource is intended to support experiments in debate, critique and prover–verifier protocols by providing realistic long-form reasoning traces with ground-truth error labels.
Theory of Change
By providing expert-annotated long-form solutions with precise error labels across difficult domains, FindTheFlaws makes it possible to rigorously test scalable oversight methods such as debate, critique and prover–verifier setups. If researchers can reliably measure how well different models identify and explain errors in complex reasoning, they can design oversight protocols and training procedures that make advanced AI systems more trustworthy in high-stakes settings.
Discussion
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -