SimpleBench

active

About

Updated 05/18/26

SimpleBench consists of 213 text-only questions with six answer options each, covering spatio-temporal reasoning, social intelligence and linguistic adversarial “trick” questions. A small group of non-specialist humans scores around 83–84% on these items, substantially outperforming all tested frontier LLMs, including o1-preview, and scores are reported on a public leaderboard.

Theory of Change

SimpleBench aims to provide a clearer signal of genuine reasoning progress in large language models by focusing on simple-seeming questions that high-school-level humans can answer but current models often fail. By tracking model performance on these everyday reasoning tasks, the benchmark is intended to highlight real gaps between human and LLM reasoning and steer research toward closing those gaps rather than merely improving saturated academic benchmarks.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support AI Explained.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -