Joar Skalse
Bio
Joar Skalse is an AI safety researcher who completed his DPhil in Computer Science at the University of Oxford in 2025, supervised by Professor Alessandro Abate and supported by the Future of Humanity Institute. He received his BA and MCompPhil in Computer Science and Philosophy at Oxford, graduating top of year. His doctoral research focused on safe reinforcement learning, reward learning, and misspecification, with particular emphasis on formally defining and characterizing reward hacking. He is best known for co-authoring the influential mesa-optimizers paper ("Risks from Learned Optimization in Advanced Machine Learning Systems") as an undergraduate, and for his NeurIPS 2022 paper "Defining and Characterizing Reward Hacking". His work also includes contributions to mechanistic interpretability, the STARC framework for comparing reward functions, and the "Towards Guaranteed Safe AI" framework. He has been affiliated with FAR AI as a researcher and is co-founder and CEO of Deducto Limited, a startup applying reinforcement learning. He received a $10,000 grant from the Long-Term Future Fund in 2019 to upskill in machine learning and accelerate his contributions to AI safety research.
Links
- Personal Website
- https://joarskalse.github.io/
- Twitter / X
- LessWrong
- logical_lunatic
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 10:17 PM UTC
- Created
- Mar 20, 2026, 2:52 AM UTC