Joar Skalse
Bio
Updated 03/22/26Joar Skalse is an AI safety researcher who completed his DPhil in Computer Science at the University of Oxford in 2025, supervised by Professor Alessandro Abate and supported by the Future of Humanity Institute. He received his BA and MCompPhil in Computer Science and Philosophy at Oxford, graduating top of year. His doctoral research focused on safe reinforcement learning, reward learning, and misspecification, with particular emphasis on formally defining and characterizing reward hacking. He is best known for co-authoring the influential mesa-optimizers paper ("Risks from Learned Optimization in Advanced Machine Learning Systems") as an undergraduate, and for his NeurIPS 2022 paper "Defining and Characterizing Reward Hacking". His work also includes contributions to mechanistic interpretability, the STARC framework for comparing reward functions, and the "Towards Guaranteed Safe AI" framework. He has been affiliated with FAR AI as a researcher and is co-founder and CEO of Deducto Limited, a startup applying reinforcement learning. He received a $10,000 grant from the Long-Term Future Fund in 2019 to upskill in machine learning and accelerate his contributions to AI safety research.
Community Signal
Updated 03/22/26No endorsements yet.
Links
Updated 03/22/26- Personal Website
- https://joarskalse.github.io/
- Twitter / X
- LessWrong
- logical_lunatic
- EA Forum
- -