Joar Skalse

Oxford, United Kingdom

Bio

Updated 03/22/26

Joar Skalse is an AI safety researcher who completed his DPhil in Computer Science at the University of Oxford in 2025, supervised by Professor Alessandro Abate and supported by the Future of Humanity Institute. He received his BA and MCompPhil in Computer Science and Philosophy at Oxford, graduating top of year. His doctoral research focused on safe reinforcement learning, reward learning, and misspecification, with particular emphasis on formally defining and characterizing reward hacking. He is best known for co-authoring the influential mesa-optimizers paper ("Risks from Learned Optimization in Advanced Machine Learning Systems") as an undergraduate, and for his NeurIPS 2022 paper "Defining and Characterizing Reward Hacking". His work also includes contributions to mechanistic interpretability, the STARC framework for comparing reward functions, and the "Towards Guaranteed Safe AI" framework. He has been affiliated with FAR AI as a researcher and is co-founder and CEO of Deducto Limited, a startup applying reinforcement learning. He received a $10,000 grant from the Long-Term Future Fund in 2019 to upskill in machine learning and accelerate his contributions to AI safety research.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2019 Q3 - Joar Skalse

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$10,000

Joar Skalse

Bio

Community Signal

Links

Grants