FindTheFlaws

active

About

Updated 05/18/26

FindTheFlaws is a scalable oversight benchmark developed by Modulo Research and collaborators. The project assembles five challenging datasets drawn from existing benchmarks in medicine, mathematics, science, coding and the Lojban language, and augments them with expert-written solutions and annotations indicating exactly where reasoning goes wrong. Using these datasets, the team evaluates frontier language models on tasks such as grading solutions and identifying the first erroneous step in flawed reasoning, showing substantial variation across models and domains. This resource is intended to support experiments in debate, critique and prover–verifier protocols by providing realistic long-form reasoning traces with ground-truth error labels.

Theory of Change

By providing expert-annotated long-form solutions with precise error labels across difficult domains, FindTheFlaws makes it possible to rigorously test scalable oversight methods such as debate, critique and prover–verifier setups. If researchers can reliably measure how well different models identify and explain errors in complex reasoning, they can design oversight protocols and training procedures that make advanced AI systems more trustworthy in high-stakes settings.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support Modulo Research.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -