Robert Kirk
Bio
Robert Kirk is a Research Scientist at the UK AI Security Institute (AISI), where he is the acting lead of the alignment red-teaming sub-team, focusing on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. He completed his PhD at UCL's DARK Lab in January 2025, supervised by Tim Rocktäschel and Edward Grefenstette, with his dissertation on generalisation in LLM fine-tuning and reinforcement learning agents. Prior to his PhD he received an integrated Master's in Mathematics and Computer Science from Somerville College, Oxford, and worked as a software and infrastructure engineer at Smarkets. His research centers on generalisation in reinforcement learning, out-of-distribution robustness, AI safety and alignment, and evaluating the effects of RLHF on language model behaviour and diversity. He received a Long-Term Future Fund grant to perform human evaluations for evaluating different machine learning methods for aligning language models, and his paper "Understanding the Effects of RLHF on LLM Generalisation and Diversity" (2023) is widely cited in the alignment research community. He also contributes to the Alignment Newsletter covering interpretability and reinforcement learning, and serves as a mentor in the MATS program for the UKAISI red-teaming stream.
Links
- Personal Website
- https://robertkirk.github.io/
- Twitter / X
- LessWrong
- robertkirk
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 12:33 AM UTC
- Created
- Mar 20, 2026, 2:57 AM UTC