Joshua Clymer
Bio
Joshua Clymer is a technical AI safety researcher at Redwood Research, where he specializes in safety evaluation methodologies for advanced AI agents. Prior to Redwood Research, he researched AI threat models and developed evaluations for self-improvement capabilities at METR. He received a $1,500 Long-Term Future Fund grant for compute resources to develop an instruction-following generalization benchmark, which resulted in the GENIES (GENeralization analogIES) benchmark and a paper demonstrating that reward models do not learn to evaluate instruction-following by default and instead favor personas resembling internet text. He is also known for the Poser paper, which introduced a benchmark for detecting alignment-faking LLMs by manipulating model internals, achieving a 98% detection rate. Clymer co-authored a widely cited safety cases report and has contributed to work on AI control, scheming evaluations, and international agreement verification. He founded Dioptra, a volunteer research group building evals for AI safety, and was among the early signatories of the CAIS Statement on AI Risk. He is affiliated with the Cambridge Boston Alignment Initiative as a mentor.
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 10:29 PM UTC
- Created
- Mar 20, 2026, 2:53 AM UTC