SimpleBench
About
Updated 05/18/26SimpleBench consists of 213 text-only questions with six answer options each, covering spatio-temporal reasoning, social intelligence and linguistic adversarial “trick” questions. A small group of non-specialist humans scores around 83–84% on these items, substantially outperforming all tested frontier LLMs, including o1-preview, and scores are reported on a public leaderboard.
Theory of Change
SimpleBench aims to provide a clearer signal of genuine reasoning progress in large language models by focusing on simple-seeming questions that high-school-level humans can answer but current models often fail. By tracking model performance on these everyday reasoning tasks, the benchmark is intended to highlight real gaps between human and LLM reasoning and steer research toward closing those gaps rather than merely improving saturated academic benchmarks.
Discussion
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -