Robert Kirk
Bio
Updated 03/23/26Robert Kirk is a Research Scientist at the UK AI Security Institute (AISI), where he is the acting lead of the alignment red-teaming sub-team, focusing on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. He completed his PhD at UCL's DARK Lab in January 2025, supervised by Tim Rocktäschel and Edward Grefenstette, with his dissertation on generalisation in LLM fine-tuning and reinforcement learning agents. Prior to his PhD he received an integrated Master's in Mathematics and Computer Science from Somerville College, Oxford, and worked as a software and infrastructure engineer at Smarkets. His research centers on generalisation in reinforcement learning, out-of-distribution robustness, AI safety and alignment, and evaluating the effects of RLHF on language model behaviour and diversity. He received a Long-Term Future Fund grant to perform human evaluations for evaluating different machine learning methods for aligning language models, and his paper "Understanding the Effects of RLHF on LLM Generalisation and Diversity" (2023) is widely cited in the alignment research community. He also contributes to the Alignment Newsletter covering interpretability and reinforcement learning, and serves as a mentor in the MATS program for the UKAISI red-teaming stream.
Community Signal
Updated 03/23/26No endorsements yet.
Links
Updated 03/23/26- Personal Website
- https://robertkirk.github.io/
- Twitter / X
- LessWrong
- robertkirk
- EA Forum
- -