Hannah Erlebach
Bio
Hannah Erlebach is an AI safety researcher based in the UK, currently pursuing an MSc in Machine Learning at University College London (started 2024). She graduated from the University of Cambridge with a degree in mathematics (2018-2021) and subsequently founded and ran the Cambridge AI Safety Hub as its full-time organizer until summer 2023. She was a Summer Research Fellow at the Center on Long-Term Risk in 2023, working on cooperative AI. Her technical research focuses on reinforcement learning, goal misgeneralization, and cooperative AI: she co-authored "Welfare Diplomacy: Benchmarking Language Model Cooperation" (2023), which introduced a general-sum variant of Diplomacy to benchmark cooperative capabilities of language models, and co-authored "Mitigating Goal Misgeneralization via Minimax Regret" (RLC 2025), which demonstrates that minimax regret objectives are more robust to goal misgeneralization than maximum expected value objectives. She has received multiple grants from the Long-Term Future Fund to support her independent AI safety research, including funding to complete a goal misgeneralization project for an ICLR submission.
Links
- Personal Website
- -
- -
- Twitter / X
- -
- LessWrong
- -
Grants
from Long-Term Future Fund
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 4:20 PM UTC
- Created
- Mar 20, 2026, 2:51 AM UTC